Incident Management
Incident Management is the support discipline which ensures a timely response to malfunctions within the IT systems. Incident Management consists of the processes by which an incident is reported, recorded, resolved and closed. The goal of Incident Management is to restore the operational service as quickly as possible with the minimum disruption to the business users. Fulfilment of that goal ensures the delivery of the best possible service availability and performance to the business.
This discipline within Service Management ensures the most efficient and effective use of resources to support the operations of the business whilst also providing a basis for ongoing system and process enhancements to improve the service.
Incident Management process may be broken down into six distinct sub-processes
- Detection and recording
- Classification and initial support
- Investigation and diagnosis
- Resolution and recovery
- Incident closure
- Ownership, tracking and monitoring
Ownership, monitoring and tracking is an ongoing process which covers the whole incident life cycle. The agent taking the call retains ownership throughout the call regardless of who is working on the solution and is responsible for tracking progress to ensure Service Level Agreements are met. Service Desk management holds a monitoring brief ensuring resource is available, progress is satisfactory and communication is both appropriate and timely.
The other sub-processes are performed in a sequential manner as illustrated across.
Detection and Recording
Incidents in an IT system may be detected in several ways, customers reporting abnormal behaviour, monitoring processes picking up a system malfunction or an audit showing up irregularities pointing to an underlying issue. Any of these detection methods will lead to a call being raised on the Service Desk where the details of the incident will be recorded in the service desk database. Basic details are date, location, contact name, system affected and most importantly the symptoms encountered.
Classification and Initial support
The details supplied enable the Service Desk Agents to identify the problem and correctly classify the call by services affected, skills required to resolve, business impact and hence severity also if the cause and resolution is known. Initial support is undertaken to provide a timely resolution to the issue which is to the satisfaction of the customer. This will usually be for incidents where a known error has occurred or the engineer has the knowledge and expertise to identify the cause quickly and rectify the fault.
Investigation and Diagnosis
This is the next stage and really comes into play if the issue cannot be resolved at Initial Support and is escalated to a specialist support group (Second Level). At this stage the process may become iterative with inputs from specialist to eliminate possible causes. The objective is to resolve the incident either by providing a resolution or a work-around which enable the business operations to be resumed. At all times the call is updated to ensure knowledge is retained and builds a comprehensive record of actions and results.
Resolution and Recovery
This step occurs when a solution or a work-around is available and has been passed to the Service Desk for implementation. Any recovery work required will be carried out in the background by specialist staff prior to the system being handed back to the business for operational use.
Incident Closure
An incident may be closed when the resolution and recovery has been completed to the satisfaction of the business customer and normal operations have been resumed. However, the details of closure action must be recorded accurately to build up a body of knowledge for future reference.