How to plan for major incidents in ITSM
- Business solutions
- IT Services
- Risk management
August 17, 2017 |
4 min read
- Business solutions
- IT Services
- Risk management
When is an incident a major incident?
An incident becomes major when the potential impact or urgency is considered significant. This means different things to different organizations, but ultimately it depends on business goals and objectives and is defined by each organization individually.
When defining the scope of an incident, it’s also important to consider what level of impact will be tolerated by the business. The latter is usually based on risk factors.
When working with organizations, I’ve often differentiated between ‘priority 1 incidents’ and ‘major incidents’, reserving ‘major incidents’ for those situations where, if not resolved quickly, business and/or IT continuity plans may need to be invoked.
Priority 1 incidents are defined as having a broad impact on users, but don’t necessarily impact the revenue generating functions of the business unless left unresolved. An example of this would be email. Conversely, a major incident is considered something that prevents the organization from operating. For instance, in a hospital this might be the patient management system or, more generally, a significant power outage.
What are the key things to consider in a major incident?
From an IT perspective, probably the most important thing to remember is the business impact, goals and objectives.
It’s easy to get lost in fighting the fire, but ultimately objectives and resolution strategies should be aligned with the priorities of the business. For example, if the business decides that its continuity plan should be invoked, then IT must ensure that its approach to incident resolution not only addresses restoring the ‘normal’ service, but also the implications of working within the business continuity scenarios. For example, if people are told to work from home, or access systems via an alternative mechanism.
In these scenarios, business will be looking to IT for guidance about how they should react, so ensure you are focusing on the outcome that supports and underpins the business goals and objectives.
Personal safety should also be kept in mind; major incidents do not always originate from IT and broad teams from across the business may be needed to ensure business continuity and/or recovery.
How can you prepare for a major incident?
Plan for, and establish a clear process for staff to follow in the event of a major incident, from the point of escalation through to resolution.
Discuss priorities with the business and decide on business critical systems as well as tolerance levels which if exceeded will trigger a major incident. Ensure everyone is aware of continuity plans and that these are tested. Today, continuity is often automatic, which may mean with good preparation a major incident can be resolved in little to no time.
This does not mean that further investigation should not occur via Problem Management, with steps put in place to ensure the incident does not reoccur.
Do roles change in a major incident scenario?
Roles are quite different with major incident management: usually the incident manager or a delegate is charged with the overall management and coordination of the resources and teams required to investigate and resolve the incident.
Generally, it is recommended a major incident team is established as soon as it is detected and the composition of this team will be determined by the skills and capabilities required to handle it effectively.
The team may comprise senior managers, technical staff, suppliers and business stakeholders, and members may not be physically located in the same place.
What format should incident teams take?
Often a two-pronged approach is effective, with one team focusing on work-arounds and service restoration and another team focusing on problem management, identifying the root cause and finding an effective solution quickly.
This stops problem management compromising the incident process in the race to service restoration and means that the root cause and solution may be found too.
What’s the difference between incident management and problem management?
Incidents and problems are different: incidents are the symptoms of problems and often the first indication that a problem exists.
The two processes have different but related objectives: incident management is focused on restoring service aligned with agreed service levels, whereas problem management is about root cause analysis, understanding the underlying cause of incidents and finding a permanent solution.
A solution is considered by the business in terms of cost versus value. Acceptance and implementation of the solution may depend on several factors outside IT and in some cases the business may decide not to go ahead with the implementation of a solution.
Where can practitioners get more advice on major incident management?
There is a wealth of guidance available around major incident management and is strongly related to continuity management. ITIL® provides solid guidance on this topic but, as with anything in ITSM, each organization should consider the process steps as a guide to what should be done, rather than how the activities should be performed. The right answer is what’s best for your organization.