Bug fixing: Five tricks we can learn from doctors

I had a bit of a health scare this week and a trip to A+E (ER). All’s OK now but the trip made me realise some of the similarities of the “bug fixing” the great doctors/nurses were attempting on me and how a good engineer will address a problem. Most of these concepts work in any field of engineering but I’m going to focus down on IT Operations more specifically.

#1 Symptoms and Cause

It’s important to remember the difference between symptoms and cause. Treating backache with pain killers will be useful in the short term but you’ve got to identify what’s causing the pain: posture, your desk chair, etc

Making sure you understand what the root cause of the issue is should be your ultimate goal. In the short term treating the symptoms might be best to get your system back up and running quickly.

#2 Monitoring

Both trend monitoring and threshold monitoring are amazingly important when it comes to identifying and resolving issues. This is why patients are so often hooked up to pulse, ECG, blood pressure monitors and why key readings are recorded regularly.

In engineering perhaps the CPU usage of the server you’re working on looks high: Is it normally this high? Is the trend that it’s increasing/decreasing?

Be sure to use tools like Cacti, Ganglia or Nagios and graph everything that’s service or business critical. This could include technical data like CPU usage, connection counts, cache hit rates as well as business data like: user logins, registrations, eCommerce basket value. I’d argue that having a little too much data is far better than having too little.

#3 Triage

When you’re presented with multiple problems you’ve got to identify which of them is more critical? Allow users to assign priority or assign one yourself in triage. Perhaps use a defect matrix to assign this according to how many users are effected, whether it’s on a production site, whether there’s a workaround or not.

This way you treat the most business critical problems first and not the ones that are most interesting!

#4 Case history

Doctors will talk with you about when this problem first started and ask related questions which might be of use with their diagnosis. Good bug reports are often critical for you to be able to fully understand and replicate the bug. It’s important that the reporter of the bug understands this through training or are forced to give detailed info in the reporting process. PHP’s Report a Bugpage is a reasonably good example of the latter.

If you can keeping some kind of history of changes/problems relating to a device or system can be really valuable. A well searchable bug/ticketing system is somewhere close to self-documenting and I’d strongly recommend version control of all server configuration files.

#5 Double-checking

If you’re getting nowhere with a diagnosis of a problem get a second opinion. If, after gaining a second opinion you’re no closer to identifying the problem then it could be worth the second engineer going through the same steps of diagnosis that you did and not just taking your word for it. Sometimes a second set of eyes will spot something subtle that was easy to miss.

Happy bug fixing!


About James Cohen
LAMP geek with interests in building scalable web applications

One Response to Bug fixing: Five tricks we can learn from doctors

  1. Mike Pearce says:

    Some good comparisons there, although I’d be wary of the taking the “Symptoms/Cause” analogy too far – remember, before there were antibiotics to treat gangrene, there were amputations!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: