Skip to content.

TalkBMC

Sections
You are here: Home » Blogs » Jeff Hyson and Guy Vider » Sweeping Out the Bugs

Sweeping Out the Bugs Sweeping Out the Bugs

Document Actions
This blog will discuss solving software problems (crashes, bugs, performance issues etc.) using advanced Application Problem Resolution software. We'll cover designing and deploying APR solutions, and analyzing the results, to get to the root cause of problems.
A first in a series of posts discussing environmental/configuration problems, how to document them and how to solve them.
Let's say you're an ISV (Independent Software Vendor), developing the next generation of CRM software. You have hundreds of customers, each with multiple installations of your software. If one of them complains of severe problems - say, intermittent crashes - does it mean your software has a major bug, or is it an environmental issue specific to that customer?

Let's look at the same problem from a different aspect: your QA guys report that your latest release is running well on every machine in their lab, save for one - where unexpected behavior has been experienced. Do you reject the build, and send your best developers to solve the problem?

Many problems experienced by our customers and us, seem like they're software-related, but are actually environment related. Different operating systems, different service packs, various versions of drivers - can all contribute to abnormal software behavior.

In 2005, when many companies upgraded to SP2 of XP (and later to SP1 of 2003), my colleagues an I saw an increase in environmental issues. The enhanced security built into those 2 service packs meant certain things stopped working outright (e.g. COM+ applications). But the developers were stymied: essentially they haven't changed anything in the release. How come a customer who used our software successfully for a year, is complaining today?

Proving that a problem is environmental in nature, and not caused by the actual code in the software, is beneficial for several reasons:
  1. As long as the environmental factor is still out there, your software will continue to misbehave. You need to find it that factor eliminate it, if you hope to correct the situation.
  2. A developer cannot help with solving such an issue. There's a good chance he won't be able to recreate the scenario on his own, or on a QA machine, if the problem is tied to a specific customer environment. This can lead to one of 2 undesired outcomes:
    1. The developer will try to solve the symptoms, based on the description he got - may not solve the problem and may even introduce new issues.
    2. The developer will ask for a replica of the customer's environment. Now we'll have to interrogate the customer (his computer spec, OS details, drivers, registry etc. will be required) and maybe even copy several GB of customer database in order to recreate the issue. And in the end - no one promises that recreation will occur.
  3. Proving that an issue is environmental frees your R&D team to deal with real bugs and allows you to correct the problem using an IT solution. It also allows you to "assign blame" (i.e., this problem is caused by Microsoft/Oracle/someone else) while working to fix the issue.
  4. Correcting an environmental issue is easier by several orders of complexity, as it usually doesn't require a build-test-package-distribute cycle.
So, how do you prove that a problems stems from an environmental factor and not from a software bug? We'll discuss that in our next post.

_____
tags:
Tuesday, December 11, 2007 in ConfigurationEnvironment  |  Permalink |  Comments (0)

Hi and welcome to the first post of "Sweeping Out the Bugs" with Guy Vider and Jeff Hyson.  For the past 4 years we've have both been working in the Professional Services team at Identify Software, acquired by BMC last year.  Our job calls for designing, managing, deploying, and supporting implementations of BMC AppSight - the leading Problem Resolution System in the market.

Problem Resolution is all the manual steps that occur prior to fixing a bug:

1.    Gathering and documenting what a bug, issue, defect (or whatever you want to call it).  This can take hours, days or weeks to finish.
  •   Ask your QA team how long it takes to fully document with screenshots a single defect - you might be surprised.
  •   Ask your technical support team how often they get all of the necessary information from a customer to solve a bug immediately.
2.    Taking that information and communicating it to the person who will need to recreate the bug on his/her computer.
  •   How many times have you heard a developer state "It works fine on my machine" and throw the issue back over the fence?
3.    After recreating the bug on a different computer, analyze the behavior to get to the root cause of the problem.
  •   How many times has a developer "Fixed" a problem only to discover later what they recreated was a different problem with similar symptoms not the reported defect?
4.    Finally fixing the bug.

We (at Identify) have found that the first 3 steps constitute 80% of the total time spent by Application Development Organizations (ADO) in problem resolution activity.  AppSight brings efficiencies gains to an ADO by automating and accelerating these manual processes and procedure.

AppSight utilizes a tiny software agent called a Blackbox similar in concept to a black box on an airplane that records all of the operations on the plane in the event someone needs to analyze a future problem.  The AppSight Black Box (software only - no hardware) agent can record your software applications in any environment: development, test and/or production.  If something goes wrong (crash, performance issue, logical issue, etc.) your team will retrieve the AppSight log, play it back, analyze the results to help accelerate finding the root cause of the problem.

AppSight automates the gathering of information to document a bug, virtually eliminates the need to recreate a bug and accelerates the analysis time to get to the root cause.  ADOs that use AppSight typically see a 50% reduction of total time spent in problem resolution.  The net result with this time saver is your ADO has more time to improve quality, add more features or go to market faster with a product.


Over the years we have managed literally hundreds of deployments around the globe, working with various development environments, methodologies, and languages.  We have been in the unique position to solve million-dollar issues which flabbergasted some of the best developers at some of the biggest development organizations in the world, and always with AppSight by our side did our best to accelerate getting the root cause of each problem.

We both come from development and project management backgrounds and believe if AppSight was used as a daily part of our work life it would have been much easier to solve issues during the problem resolution stages in the SDLC.  We consider ourselves AppSight evangelists and use it daily at both work and home to solve problems, we have yet to find a single person not impressed by its capabilities.

Each post in the blog we will discuss a different bug from our past experiences and do a postmortem on how someone would approach it without and with AppSight.  We will attempt to cover common technologies/environments and avoid niche issues.

Questions, comments and requests are always welcomed.  If you are a past or present AppSight user, we'd love to hear your experiences, if you haven't converted to the light side yet we hope you'll stick around and join us eventually :)

~Guy and Jeff



_____
tags:
Monday, December 03, 2007  |  Permalink |  Comments (0)
Jeff Hyson and Guy Vider

Subscribe to Jeff Hyson and Guy Vider's blog Subscribe to Jeff Hyson and Guy Vider's blog

Jeff Hyson and Guy Vider's Bios

Sweeping Out the Bugs
« May 2008 »
Su Mo Tu We Th Fr Sa
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
Categories:
Crash (0)
Performance (0)
 

Powered by Plone

This site conforms to the following standards: