Keeping your house in order
Conducting continuous maintenance to fix the cracks and prevent future issues
At the time of writing we’re in the thick of Storm Éowyn, and sadly my garden is already taking a battering. Yet again, the sole wooden fence panel at the rear of my garden has decided to dislodge itself and fall into our neighbours garden.
My previous attempts involved simply slotting it back in, followed by additional bodges which involved screwing extra bits of plywood on to help secure it. I’m trying my best to draw out an analogy which allows me to segue into a theme about maintaining teams and software, but let’s just keep it simple.
Maintenance is continuous
Just like my fence panel which faces the constant battering of the British weather system, teams and software also begin to degrade over time.
Processes become inefficient, quality can begin to fade and the resulting outputs are often sub-par. This is where maintenance of teams and the software they write is crucial as a perpetual process. But what exactly can you do to perform ‘maintenance’ on teams or software?
Teams
When I say “teams” I mean it both in the literal sense of a group of software engineers, product owners, agile coaches etc, but also in the wider sense of cross-functional and collaborating groups of people.
Retrospectives
Retrospectives, an agile ceremony, allows teams and individuals to do some solid reflection. This is massively important thing for making changes as individuals and as a wider team; an opportunity to dig deep into what works, what doesn’t work and any opportunities you might be missing out on.
Tip 💡: Don’t fall into the trap of holding retros for the sake of it. Record actions, hold each other to account and treat the actions as first-class citizens within your workflows to ensure they don’t keep getting deprioritised.
Define your team APIs / contracts
When interacting with other teams, I think it’s important to set up some ground rules (so to speak) which allow you to form a contract or API. Now I don’t mean this in the literal sense of an actual software API, but more of an agreement on how your team should be interfaced with the rest of the business.
The bigger the business is, the more important this becomes as the challenges are heightened. This can involve putting things in place such as:
Support request forms (in JIRA, Google Forms etc.) to simplify request workflows and remove dependencies on people being contacted
A definition of what services your team provides, some basic self-serve support information and points of escalation
Clear communication channels such as in Slack or Teams (🤮)
Learning and Development
Often overlooked by most businesses and teams, learning and development is a massive one for maintaining a team’s effectiveness. A little investment in the knowledge of people can result in continued innovation, alongside the obvious benefit of personal growth of the individual.
Celebrate Success, Reflect on Failure
Having a culture of celebrating wins is massively positive for individual and team morale. Singing your own praises is a great way to get some recognition for the hard work done in the team, and in my eyes of of the more important things seen in a well-maintained team. There’s nothing more damaging than burnt-out, demoralised engineers who have no sense of accomplishment in their work.
Similarly, it is equally as important to be candid about failure. This isn’t about finger pointing, but instead providing a chance for reflection and providing healthy feedback to one another. Poorly-maintained teams stagnate, never learn from their mistakes and become inefficient as a result.
Software
Software is a different beast to people, but can be equally as difficult to maintain. There is also a huge knock-on effect at play whereby a lack of maintenance around people and effective process efficiencies can result in poor quality software. Equally, terribly-written software can be so cumbersome and painful to work with that people can bend their own usual ways of working simply because a system is ‘legacy’ and hence doesn’t require as much love and attention - wrong!
Refactoring
I could write until the cows come home about this topic, so I’ll keep it short. Maintaining software requires frequent restructuring of the code to make it:
better architected to the changing requirements
easier to understand
to reduce dead code
I’ll save myself reinventing the wheel on this one, but a cracking resource I’ve always pointed people to is Refactoring.guru - a great resource for many languages with clear examples.
Addressing Technical Debt
Technical debt is another subject that can be discussed at great length. In short, it’s the ‘debt’ incurred from making certain technical decisions that ultimately will result in re-work further down the line. Sometimes it’s not always possible to invest up-front with certain approaches, but as time goes on things can become painful to work with unless they are addressed (we just spoke about refactoring…).
The key with tech debt is to:
make it visible
factor it into your workstreams
prioritise it effectively
tackle some as part of other tickets where possible, chipping away
Maintain Documentation
It’s a common myth that when working in agile development environments that documentation doesn’t need to exist - this is a load of crap. Instead we should think of things in terms of ‘just enough’ documentation.
The level of documentation required simple comes down to being pragmatic. Think about:
who is the audience?
what is the key information they need to know?
what format makes sense? e.g. a README.md or RUNBOOK.md over a more comprehensive Confluence page
do diagrams need to be included?
The other important thing is to make sure that such documentation is living. By this I mean consider documentation changes as being as important as the code changes themselves. When there’s a lack of synchronisation between documentation and the software, things can become pretty messy and confusing.
How many times have you been going through the new-starter onboarding documentation and found that something doesn’t exist or hit a plethora of errors that aren’t documented anywhere? I guarantee you have had a few!
Consider Optimisations
Working software is great, but it’s also dynamic and can experience varying levels of usage and other conditions. As such, 9 times out of 10 it isn’t a case of simply letting software exist to do its job.
Using data points such as latency, CPU utilisation, memory allocation etc you can build a picture of the performance of your system. You may find that bottlenecks begin to appear, sparking an opportunity to maintain your system through optimisations. Although not a new feature for your customer, looking at opportunities to make things faster or even more cost-efficient for the business is always a win.
Consider things like:
can you add caching layers anywhere to reduce loads on databasess?
are there any Platform as a Service (PaaS) offerings which can reduce some of the overheads of maintenance?
are your VMs powerful enough to handle the traffic, and conversely are they too large and can be scaled back?
do you have load balancing in place with auto-scaling groups to handle peaks and troughs automatically?
The list goes on…
Patching
Another important thing that is easy to get pushed down the list - patching.
Third-party libraries and frameworks that you use are getting updated just as frequently as your own code. Security vulnerabilities are found, performance enhancements are made, new frameworks are released and all for good reason. However, if you don’t update your own references to these then you’re setting yourself up to experience some levels of pain further down the line.
For example, one company I worked with had kept putting off moving from AngularJS to Angular despite its end of life looming on them. This ended up resulting in costly third-party support contracts to ensure they were covered in the event of future security vulnerabilities beyond the end-of-life from the original creator.
TIP 💡: There are modern tools that can be integrated into pipelines which automatically check for dependency updates and auto-generate pull requests with the changes in. This takes away 90% of the work for you, so consider using them (e.g. Dependabot on GitHub).
Failing to maintain is choosing failure
So all of the previous points are far from exhaustive, but hopefully serve as a reminder for the things we should be thinking about in our teams and in our software.
Failing to maintain any of those things is simply like running on a treadmill of doom - you’re never going to escape the pains of poorly-maintained people or systems. Closing your eyes and pretending there are no issues is a sure way to drive you, your colleagues and the product into the ground even if everything seems fine now.
Having worked on well-maintained systems and some shockingly-poor ones, the difference really is night and day. When you suddenly have to respond to a random high-pressure incident, you will quickly appreciate a system that has up to date runbooks, is architected simply and is observable.
Make your choices wisely.