Automation is happening – there is no question about that. For companies’ IT departments, automation means having the ability to rapidly and efficiently provision (and re-provision) technology resources to optimize utilization and trim operational costs. These are good things… but aren’t without their challenges. The same automation capabilities that enable speed and business agility are making it increasingly difficult for IT organizations to solve problems when they occur in the technology environment.
The first challenge comes in the form of technology environments enabled by automation evolving faster than the IT processes used to manage them. It’s great that your car can go fast, but if you can’t steer at high speeds, you have a problem. Most IT problem management processes involve diagnostic methods that seek to reproduce the original issue through synthesizing the environmental factors and repeating known sequences of events. Automation capabilities have been focused primarily on accelerating change (and business agility) – not enabling the repeatability needed to understand what happened and why. When IT staff go to reproduce an error, the environment has likely changed and they don’t have the tools to reproduce the way it was when the issue occurred.
Cross-platform / vendor environments complicate this problem further by moving key pieces of the technology picture outside of the company’s direct control – requiring coordination across vendors who compete and don’t really want to share information openly. Cross-vendor environments provide cost arbitrage opportunities (using the cheapest options for a specific activity) but often leave IT lacking big-picture transparency because each vendor’s diagnostic tools are unique and most are not interoperable. Third party Service Management and operations management tools have sought to bridge this gap but in most cases, fall short of giving IT the full set of problem-solving tools they need to be successful brokering across vendors in a highly-dynamic automated technology environment.
Speed and environmental complexity are not the only challenges (lots of things are complex and evolve quickly and are still manageable). It is continuous change that is becoming problematic to manage because most IT processes are built around the pattern “stop, figure out what happened, fix it, and then start moving again” (instead of fixing the airplane while it is flying). By the time an event or incident occurs, it is often too late to capture the actions and environmental factors causing it (the trail of breadcrumbs has disappeared). Dynamically reconfiguring infrastructure (enabled by automation) makes diagnosing environmental issues difficult as there may be no way of knowing if the same environmental configurations will occur again in the future. Analyzing patterns of cause-and-effect can help IT staff infer what may have caused the issue but often their hypotheses lack the levels of confidence necessary to enable preventative action.
Over time, automation rules evolve and compound each other – becoming increasingly complex until they reach a tipping point where the machines can execute them but the humans are no longer able to interpret the rules. Incidentally, this also happens when the people who create the rules change job roles. A lack of understanding of why something was implemented in a certain way not only prevents effective problem diagnosis but also inhibits IT’s ability to make changes to prevent the same environment / event scenario from occurring in the future. To address this situation, IT problem management staff need the ability to determine both what was going on in the technology environment when the event occurred as well as why the automation rules implemented the configuration that way – something IT struggles with today.
Modern IT environments evolve quickly and IT staff have a limited window of time to reproduce and solve problems when they are encountered. To be successful in diagnosing problems in automation, your company’s IT staff need a robust and well-structured methodology to help them quickly survey the environment, identify what is important and initiate action.
With over 60 years of experience working with companies to implement problem management processes and best practices, the experts at Kepner-Tregoe know what it takes for organizations to adapt to changes in technology. Using the KT methodologies can help your staff to:
- Get a clear understanding of the situation at hand by asking clarifying questions
- Gather data through logs to develop an understanding of the situation and circumstances when the problem occurred – in a highly automated environment, this is the closest alternative to replication
- Apply a process-driven approach using the data to find the cause of the problem, considering the challenges of cross-platform / vendor environments
- Use a rational process to sieve out the data that is needed and not be overwhelmed by the voluminous amount of data ‘noise’ that traditional big data analysis generates
Automation is happening, and there is no stopping it.