Client
A major global investment, wealth and services management company and one of the largest custodian of assets in the world was looking to increase its IT stability through better root cause analysis of major client-impacting incident.
Challenge
IT problems were being handled through a dispersed group of individuals. Once a major client-impacting incident was resolved, the investigation for finding the underlying cause of the incident was assigned to a technologist, who went off into a silo to try and determine root cause and corrective actions. This was supported by a group of “Problem Administrators” without having a clear strategy or aligned approach, making sustained improvements virtually impossible.
Solution
A pilot project was implemented within the infrastructure organization to improve the overall Root Cause Analysis (RCA) process with the ultimate goal of improving stability. All Problem Administrators were trained in the Kepner-Tregoe methodology, significantly elevating their skills. After receiving the training, they were coached internally by a certified KT-coach. At the same time, the group was reorganized into a centralized, aligned function of true Problem Managers that were now responsible for handling all RCAs within the technology group. Once the root cause investigation began, the problem managers used KT Problem Analysis to aggressively drive the process to find root cause and define corrective actions, in collaboration with the Subject Matter Experts (SMEs). In addition, a new governance model was introduced that included reporting and senior management ownership for the process. The goal was to recognize a minimum of 20% improvement in overall stability of Priority 1 incidents, as measured by the number of high-impact incidents in the organization.
Results
Based upon the rapid engagement of SMEs for the root cause investigation by the Problem Manager, reinforced through the disciplined approach of Root Cause Analysis, which resulted in more accurate findings (and thus, the right corrective actions identified and implemented), the program saw a significant improvement in overall stability.
Confidence in the infrastructure team’s ability to deliver on tangible and sustainable improvements to stability resulted in greater recognition at the highest levels of the organization on this major accomplishment. The ultimate outcome was a significant increase in problem data quality, a reduction in time-to-root-cause as well as major incidents and consequently a significant increase in IT stability.
At the end of the pilot program the senior IT management approved that this approach piloted within the infrastructure space be extended across the rest of the IT environment.
Scorecard
- 76% reduction in the average number of days to complete RCA
- 38% reduction of major impact incidents
- Proactive identification and prevention of over 10,000 potential problems
- Recognition at the highest levels of the organization on this major accomplishment

