When is a Software Problem not Your Fault?
A first in a series of posts discussing environmental/configuration problems, how to document them and how to solve them.
Let's say you're an ISV (Independent Software Vendor), developing the next
generation of CRM software. You have hundreds of customers, each with
multiple installations of your software. If one of them complains of severe
problems - say, intermittent crashes - does it mean your software has a
major bug, or is it an environmental issue specific to that customer? Let's look at the same problem from a different aspect: your QA guys report that your latest release is running well on every machine in their lab, save for one - where unexpected behavior has been experienced. Do you reject the build, and send your best developers to solve the problem?
Many problems experienced by our customers and us, seem like they're software-related, but are actually environment related. Different operating systems, different service packs, various versions of drivers - can all contribute to abnormal software behavior.
In 2005, when many companies upgraded to SP2 of XP (and later to SP1 of 2003), my colleagues an I saw an increase in environmental issues. The enhanced security built into those 2 service packs meant certain things stopped working outright (e.g. COM+ applications). But the developers were stymied: essentially they haven't changed anything in the release. How come a customer who used our software successfully for a year, is complaining today?
Proving that a problem is environmental in nature, and not caused by the actual code in the software, is beneficial for several reasons:
- As long as the environmental factor is still out there, your software will continue to misbehave. You need to find it that factor eliminate it, if you hope to correct the situation.
- A developer cannot help with solving such an issue. There's a good chance he won't be able to recreate the scenario on his own, or on a QA machine, if the problem is tied to a specific customer environment. This can lead to one of 2 undesired outcomes:
-
- The developer will try to solve the symptoms, based on the description he got - may not solve the problem and may even introduce new issues.
- The developer will ask for a replica of the customer's environment. Now we'll have to interrogate the customer (his computer spec, OS details, drivers, registry etc. will be required) and maybe even copy several GB of customer database in order to recreate the issue. And in the end - no one promises that recreation will occur.
- Proving that an issue is environmental frees your R&D team to deal with real bugs and allows you to correct the problem using an IT solution. It also allows you to "assign blame" (i.e., this problem is caused by Microsoft/Oracle/someone else) while working to fix the issue.
- Correcting an environmental issue is easier by several orders of complexity, as it usually doesn't require a build-test-package-distribute cycle.
_____
tags:



