- /
- /
Troubleshooting Methodology - A Great Interview Question
Yesterday, I responded to a forum question about CSS changes not appearing on a Drupal site. The user was frustrated that their updates were not appearing, even after clearing both their browser's and the Drupal site's caches. The subject of my response was "Start with Basic Troubleshooting" and it caused me to reflect on how this is one of my favorite questions when interviewing technical candidates.
There isn't a coder or administrator alive who gets everything right, the first time, every time. Invariably, all of us need to troubleshoot somethin' that just ain't workin' quite right. In mind my, a competent and seasoned professional has a good troubleshooting methodology as a result of experience; therefore, I consider this a critical question for almost any technical position and have been continually amazed at some of the responses I get.
"You realize that something isn't working correctly. What steps would you take to resolve it?"
- "I'd look in the code for a typo"
- "I'd restart the server"
- "Check the log file"
- "Ask somebody to look at it with me"
- "Clear the cache"
Each of the above responses share a common trait: they are fine as individual steps in a certain situation, but as the first response to my question they are missing the point. Just like other important aspects of development or administration, good troubleshooting starts with a methodology; a repeatable sequence of steps you take to achieve your goals successfully.
Here is my general methodology for troubleshooting almost any issue with an application or system. These may seem obvious or even just plain common sense, but the point here is that these ARE obvious and common sense to good IT professionals.
Confirm that something is actually wrong
Before you do anything else, you need to make sure you really do have a problem. Patently obvious, huh? Raise your hand if you've ever had a "oh it's supposed to do that?" moment. The next item is related to this idea; we're all human and sometimes we see (or don't see) things incorrectly.
Eliminate any PEBKAC errors first
What on earth is a PEBKAC error? It's a (somewhat derogatory) acronym in the help desk/support realms that stands for "Problem Exists Between Keyboard and Chair". Essentially, YOU are doing something wrong - the system is fine. I once spent 3 hours troubleshooting a report that wasn't showing any data, only to realize I had kept choosing the wrong filter criteria on the search form. Duh!
Identify only those areas which could potentially be causing the problem
Systems are complex, with many layers (network, operating system, web server, database server, framework, data access layer, object classes, services, client-side code, etc.). It's important to identify only those which can logically be the culprit and then remove the rest from consideration. This narrows the field and makes figuring out the problem much easier.
Another way to think of this is "Where is it most likely going wrong?"
Only change one thing at a time
Unless you are certain otherwise, make only one change and check if that fixed anything. If you change two, three or more, how are sure which one of those actually fixed the problem? Argument number two for this: you may fix your initial problem, but then create another one in the process. Think iteratively!
Take advantage of outputs to help
Whether this is a log file or just text mixed with the normal output, take copious advantage of seeing what your system is doing behind the scenes. toString (Java), print_r (PHP) and alert (Javascript) are your friends; even better if your IDE or framework has good troubleshooting tools that do this work for you. The point here is to not guess what's going on, but rather to know what's going on.
Explain the problem to someone else
Don't take forever trying to figure out something that's driving you nuts. You're wasting time, energy, and possibly your client's money. After you've taken reasonable steps to figure out the issue, call a friend or post a forum question about it. Besides hopefully getting some help, there is a psychological phenomenon where simply articulating a problem will cause you to come up with the answer on your own. In other words, when you force your mind to describe the problem, simply thinking about how to explain it will suddenly make the answer perfectly clear. This happens to me probably once a week or so.
Almost any troubleshooting situation can be addressed with the steps above. Just like programming is more about logic than it is about syntax, troubleshooting is more about the methodology than any individual task.
