In the first post of this series, we discovered which questions must be asked—and answered—to quickly and efficiently assess quality within Problem Management. Now, we’ll explore the importance of finding the root cause of a problem after assessing its quality.
Get Control Over Recurring Problems—Get Stability
Some may say that when the “magic” is done well, the business will see a low number of recurring problems, which was indicated as a performance indicator for Problem Management in the previous blog. Unfortunately, this is true.
When recurring problems occur, many companies take this as a signal that the Problem Management process didn’t do a good (or good enough) job in finding the problem’s root cause at the time of the first occurrence. It can take weeks or months for a problem to occur, which is why using this metric is a lagging and imprecise indicator for Problem Management performance.
What is really needed is a way to measure the performance (and therefore the value) of Problem Management such that a company will be able to foretell the number of recurring problems. In other words, it’s creating an understanding of what the leading performance indicators are for Problem Management.
Finding measures that indicate how well problems have been solved may only have a mild effect for simple, low-impact problems, where recurrence certainly isn’t welcome but it’s not catastrophic either. Some companies occasionally have critical incidents and problems where they balance on the edge of a catastrophic business event tied to one or more IT-related events. Then they resolve to never, ever go through that experience again! Measuring recurring problems and trends are not likely to be a good metric.
A Best Practice for Magic?
Asking engineers and analysts for their internal thinking processes, when they’re handling problem tickets, often evoke many different answers. This response is completely different from when the same audience is asked how to configure a specific application or some hardware. Nowadays, it is quite obvious that a common approach for configuring an application or a piece of hardware has many advantages. These include:
1. A “best configuration” for the asset being used reduces variation.
2. A common understanding of how assets add value to the entire infrastructure helps with capacity management.
3. Communication is simplified on how assets are configured or changed.
4. Seamless and high-quality transition and maintenance are allowed.
Given these factors, it is remarkable that there’s often no common approach for handling problems. Hence, this remains the “magic.”
When a best practice for identifying a problem’s root cause is established, it provides very similar advantages as a best practice for configuring an asset. Additionally, tracking these best practices would provide the framework for a new troubleshooting language. This language would provide terminology to enable companies to document what the “magic” looks like and how conclusions are reached.
In the next article of this four-part series, we’ll take a look at the Kepner-Tregoe method for Problem Analysis and the criteria that initiate this process.