Contact us

Incident Management vs. Major Incident Management

Two young men working together in office

Most companies have incident management processes in place to address everyday small-to-medium disruptions. These processes are typically based on proven customer service methodologies and/or standard IT service management practices, like ITIL. While generally effective for handling large volumes of low-impact incidents and service requests, these processes fall short when it comes to managing major incidents, which are a different category altogether. Major incidents require a unique and separate approach.

Impact and frequency

A standard incident usually affects only a few users, allowing for response and resolution times that are typically longer to help keep operational costs low. Major incidents, on the other hand, have significant repercussions for the business as a whole. Though thankfully rare, when major incidents occur, they can disrupt entire business units. In these situations, the financial impact of the incident far outweighs the cost of its resolution, making response speed and quality the key factors for success.

Skills and roles involved

In general, service desk personnel with limited training and technical expertise handle most incidents. Complex issues are escalated to second- or third-tier support teams with more specialized knowledge. However, the goal remains to resolve issues using the least technically skilled (and least costly) resources available. Major incidents call for a different strategy altogether. Here, the focus should be on engaging the individuals who can resolve the disruption the fastest, thus minimizing extended business impact. Typically, these resources are highly skilled (and correspondingly high-cost) subject matter experts.

Processes

Recent years have seen a shift in incident management processes toward self-service, automation and asynchronous support interactions (e.g. email-based interactions with global call center teams). This approach is designed to optimize the scalability of incident management processes while reducing human interaction. However, this emphasis on scalability often comes at the expense of time needed to resolve more complex disruptions. Major incident processes, therefore, must be optimized in the opposite direction, prioritizing solution effectiveness and speed of resolution over resource cost and automation.

Communication

In typical incident scenarios, management might perceive the need for communication as a failure. Major incidents are different in that active and broad communication with stakeholders is not only helpful for accurately assessing the impact but also essential for managing expectations and assuring stakeholders that the situation is under control. In many major incidents, the perception created by communication plays a more significant role in shaping the overall impact than the technical problem and its associated symptoms. Effective communication during a major incident needs to address four distinct groups of stakeholders:

  • The affected user community whose activities are directly impacted by the incident
  • Stakeholders who are either indirectly, or potentially affected, whose trust is crucial to managing the incident
  • Internal teams and subject matter experts involved in diagnosing and resolving incidents (this may include vendors)
  • Support and IT Management
A diverse team of young employees having a meeting

Executive involvement and decision making

Major incidents almost always require the involvement of an executive to help assess impacts, facilitate communication, and make key decisions to remove obstacles. Often, the actions needed to resolve a major incident extend across different business units, raising questions around decision-making authority. Without clear guidelines, this can quickly lead to overlapping authorities and confusion. A major incident management process should include cross-functional guidelines for decision-making to prevent delays and misunderstandings.

Alleviating symptoms vs. preventing recurrence

The primary goal during a major incident is to mitigate impacts and take corrective actions to restore normal business operations. Understanding the root cause and implementing measures to prevent recurrence falls under the domain of problem management. Given that a major incident has substantial business impact, it’s common for executives to follow up actively, ensuring that the root cause is identified, and preventive measures are implemented promptly.

However, in the chaos of managing an active major incident, crucial diagnostic information is often lost, which complicates efforts to pinpoint the underlying cause.

To avoid these two pitfalls, a highly integrated, comprehensive incident and problem management process is essential. This process should actively capture and document critical “cause information” to ensure continuous service improvement. Only in this way can true IT stability be achieved over the long term.

Don’t wait until it’s too late

While executives cannot control when major incidents will happen, they can control how the company responds and manages them. Excellent overall service, including an effective and well-understood major incident management process, is key to responding quickly, addressing immediate impacts, protecting the company’s reputation, and reducing operational and customer risks.

As a leader in problem-solving, Kepner-Tregoe has been working with clients for over 60 years to enhance their ability to manage major incidents in both operations and IT, supporting them in achieving service excellence.

Take a look at our training page or contact us to learn more.

Related

Blog Image 1
Incidents and Problems – Opposite Sides of the Same Coin
Blog Image 1
Major Incident Management: Don’t Wait to Plan Your Major Incident Response

Contact Us

For inquiries, details, or a proposal!