Skip to main content

Command Palette

Search for a command to run...

Goodbye toil

Updated
7 min read
Goodbye toil

A lot of my work lands in the difficult middle: production issues in code I didn't write, upgrades that have been avoided for years, legacy systems that nobody fully understands.

That's not unusual. Most staff engineers and tech-leads spend more time there than they'd prefer.

For valued production systems, being an early tool adopter carries too much risk. In late 2025 experience reports had changed, the tools had got much better. So in January, I decided to find out for myself, with the real work that clogs up and slows down product improvement. This is what I learnt.

Tooling choices

I wanted to see how LLMs and LLM based tech could assist me in getting stuff done. I chose to use the basic tools - nothing over-complex - Direct conversations with Claude and Github copilot. I didn't look to use any clever agentic SDLCs, or Agentic / out of loop working, that can come once I trust the basics.

Notable experiences

Fire fighting and Improvement

Legacy code fixing. We had some serious production issues in some legacy code. I used Claude to help me debug and solve. I was able to swiftly gather suggestions for the root causes of problems as well as generating missing test automation.
It absolutely sped me up, both in analysis and action - the test automation reducing a big chunk of toil and leaving the code in a much better state. I did need to take Claude's suggestions with a pinch of salt, overlaying my understanding of the situation, and not just going where Claude wanted to go.

Cross service bugs. To fix a complex bug, I worked with Claude to develop production code across 2 services that I had minimal experience of. I worked much faster than I would have done alone and with much more confidence.

Toil removal

Upgrades and dependency management. I upgraded a legacy JS configuration library that had not been maintained for 2 years, to allow it to be adopted and owned by a new team. Claude and I moved from a tool that didn't build with out of date expectations of node, buildscript and tool-chains. Downstream consumers set constraints that were hard to navigate, but in less than a day, the library was working, a new version was released and the team was happy to adopt it.

Tricky dependency upgrade conundrums. Huge monoliths often present complex upgrade issues. Using Claude a number of these were solved swiftly, needing little attention from me; Claude doing the work with a prompt. Where upgrading linters has broken the developer experience, Claude could swiftly generate additional canary tests to stop the same upgrade issues breaking main again.

Legacy updates. I upgraded and rebuilt a 20 year old wordpress website. The WP theme has long dropped out of update cycles and that blocked critical PHP updates. Claude and I also worked to make the old site perform better again - 2 days of work that I had put off or hacked around for years.

Generating boilerplate. I built a new specialist-subject blogging website in a single evening. This would either have taken many evenings of fiddling around and learning new tools, or paying a wordpress host some money, and then still spending many evenings fiddling around.
Using Claude, I spent an evening building something free, based on github pages and Eleventy, allowing me to focus on creation not boilerplate and toil.

Editing

Claude as editor. Claude acted as editor for a new conference talk and associated writings. This actually made the production take longer, but the quality of work produced is much higher than it would be by me alone. Claude both spotted bad formatting and grammar as well as repetition, missed emphasis and over-repetition of theme.

Creating slide designs. Claude started well, and then massively fumbled. Causing a lot of personal frustration. Claude was very good at selecting styles and themes. But when it came to layouts and any complexity it consistently over-promised and under-delivered. The conversation was far more valuable than the output.

Validating pitches against requirements. Claude checked my conference pitches for clarity and for matching the conference briefs. It caught several beats I'd missed, meeting the Call for Speakers far better.

What did I learn?

Claude really paid off when working on 'Toil' - the work engineers and teams need to do gets in the way of the craft: true improvements and innovation. I cleaned up and solved pain points fluidly.

I could also use Claude well to dive into problems in areas new to me, learning about the code base, adding missing parts like tests and thinking around problems. I moved fast and with confidence.

Claude did far better than Co-pilot in any head to head challenges I set them both. Co-pilot often picked up a faulty angle, or came up with low quality solutions and seems to retain little context through a thread. It was always a battle. I stopped using Co-pilot for anything but repetitive tasks and trusted Claude's output far more.

Both Claude and Github Copilot struggled with adjusting complex Github actions for system builds. Without easy sandboxes and with gaps in documentation they had some of the same problems I had had previously. I still think we got it done faster than it would have gone.

Anger and frustration with tools happens, but boy did I get cross with Claude. Somehow the way it makes promises can be frustrating when it under-delivers and it's easy to get into corrective loops where a different approach would be a better move.

There were times when I needed to filter Claude's views through my own experience and knowledge, and times where I wished I'd verified what looked like a safe change. This highlights the continued need for feedback loops: both engineers assessing decisions near-the-loop and test automation to steer in-the-loop.

Surprises

Claude did some stupid, unexpected things, but far fewer than I expected. The only really bad one was removing and replacing a working subscribe widget with a fake one, when I asked for the styles to be changed, the day before a conference talk.

Claude selected some really good defaults several times, building a website out of open source components rather than generating code and suggesting using markdown based slide generators rather than trying to be clever with google sheets.

The bigger surprise was how useful Claude was as an editorial partner. I need to be careful not to allow Claude to rewrite my words but working with a consistent editorial framework (via a few skills and contexts) really helps me make progress to a new higher standard.

Conclusions

I read a lot about how Claude can remove the need for Engineers. That might be so, but I am confident that it can clear out the toil and speed up the firefighting that most teams need to tackle week to week, freeing up engineers to focus on product improvement and innovation.

These LLMs bring new options to the table for long term ownership and managing maintenance costs and risks. If I had had these tools 2-3 years ago, I would have tackled some huge legacy challenges quite differently. I'm looking forward to applying them at scale.