Theory Thursday: Pulling on the Andon Cord
What we can learn from the Toyota Production System
The Toyota Production System (TPS) pioneered the principle of lean manufacturing, which by its very nature focuses on the reduction in waste. Although originally applied to car manufacturing, it is now used in many different industries as a means to simply processes and increase the value to the end user.
Software engineering is no exception to this, where the industry has learnt a lot from the TPS. You can often find the following characteristics of TPS and lean manufacturing through things such as:
Defining Value
Understanding the customer’s need from their perspective
Things not contributing towards value is considered waste
Defining Value Streams
An holistic view of all the steps involved in delivering the value to the customer
Waste or inefficient steps are identified and improved or removed entirely
Creating Flow
A focus on avoiding disruptions in the process, ensuring there are limited (or no) bottlenecks
An aim of optimising the flow of information and work through the value stream
The ‘pull’ system
Production is based on the demand of the end user
A focus on reducing waste through overproduction, or in the case of software engineering, doing unnecessary work
Continuous Improvement (Kaizen)
Refinement of the processes overall, making small adjustments to improve the overall efficiency.
In an agile software team, a retrospective with suitable actions is a good example of this
The Andon Cord
Now we know about the Toyota Production System at a high level and the benefits it brings, lets talk about the main topic of this post - the Andon cord.
Within the TPS there is a Japanese idea of Jidoka (Japanese 自働化), meaning “automation with a human touch”. This idea focuses on the idea that we should automate processes where we can, but also allow for, and encourage human interactions where they are needed.
The Andon cord is a physcial cord that exists in the Toyota production facilities, spanning the entire length of the production line. At any point along the production, the cord could be pulled to signal an alarm identifying a problem within the process and get a shared focus on a problem.
The Andon provides several things:
1. Empowerment
Anybody within the TPS, regardless of their position in the business, has the ability to halt the entire production line if they find an issue. Issues could be anything from safety problems, manufacturing defects, a mechanical issue etc.
The primary objection is that immediate problem solving is sought, rather than issues being pushed further down the line. Not only would that increase waste, but it means the issue could get worse the longer it is left.
In the context of software engineering, this empowerment allows for anybody to take responsibility for raising issues. People aren’t held back from voicing their concerns, and as such will never be penalised for calling out issues even if they transpire to be a false alarm. Incident management in software teams is a good example of this, where (in good cultures) everybody is encouraged to raise the alarm if they spot something out of the ordinary (e.g. a peculiar anomaly in traffic which could either be a spike in real users, or maybe a sign of a DDoS attack).
2. Real-time Problem Identification
When anybody pulls on the Andon cord, an alarm system comprised of lights and sound makes everybody else aware that there is an issue. The feedback loop of this mechanism is small, so investigation and resolution can take place immediately without unnecessary delays.
The Andon in software teams can take many shapes and forms, but the real important thing is that the feedback loop is small.
A team might use dashboards with alerting over key metrics, or someone might spot strange an issue in production that nobody else has seen before. Either way, making the rest of the team aware is the Andon in this case.
Some companies will also utilise software like Incident.io which can help automate the incident management flow, alerting key personnel and setting up things like Slack channels and post-incident review templates.
3. Continuous Improvement (Kaizen - 改善)
Kaizen (改善) means improvement in Japanese. It’s not just a word, but also a philosophy of continual improvement by which problems can be addressed, fixed and improved moving forwards. It’s very much a proactive means to maintain and ensure high-quality outputs instead of retroactively fixing things after they’ve already made a large issue.
Agile development teams are familiar with the idea of continuous improvement, namely through retrospectives whereby the teams seeks to identify issues and fix them moving forwards. In the context of the Andon cord, it’s about halting the flow before things get progressively worse.
This could be as simple as realising there’s a critical defect in a deployment which is currently going through the pipelines, so halting this is key. The Kaizen attitude then means that things like missing test cases, monitoring and alerting are added to catch this in future.
4. Reducing Waste (Muda - 無駄)
Muda (無駄) means useless/unnecessary waste - in other words processes or material that could have been avoided and hence saving time, cost or production effort.
When the Andon cord is pulled for the purpose of identifying Muda, it prevents the accumulation of defects/wasted effort and ultimately saves time that inevitably would have been spent fixing issues further down the line. Prevention is better than the cure.
Waste in any context is generally bad - it just means teams and individuals are putting their time (and hence money) into things which are potentially pointless.
A simple example in software engineering would be the use of release pipelines. Sure, you can manually deploy every single release, but that is time consuming and prone to error. Using automated pipelines like Github Actions allows these things to be scripted, repeatable and most of all require limited-to-zero manual input.
5. Collaboration and Transparency
The Andon cord creates a real sense of collaboration within the production culture. The ownership is on everybody, irrespective of who pulled on the cord. This shared ownership means that everybody bands together to identify and resolve issues without jumping into blame cultures, with a focus on understanding and addressing root causes of issues.
Collaboration and communication is key, especially in software teams. Ambiguity and cultures of hiding problems only augments the issue further.
The collective ownership of having an Andon-type mindset ensures that software teams can work together with minds on the end goal of fixing issues, not pointing fingers. This mentality means people seek to understand, mitigate and fail forward with their approach, leading to a healthier team dynamic and ultimately a better end product.
Conclusion
So this week’s Theory Thursday was short and sweet, but hopefully a nice topic that anybody can use even if you’re not in the software or car manufacturing space.
Empower people to raise issues, work together to fix them, seek continual improvement and try to minimise waste wherever possible.
Come back next week I dig into the Peter Principle.
Thanks for this Michael - I hadn't realised there were so many contextual aspects to the Andon Cord - you've just made what was a basic concept in my head a whole lot richer.