Service Operation
Event Management Incident Management Request
Event Management
Introduction
Event Management is an important process
in Service Operation lifecycle because it detects, records and updates past and current
events for future reference.
Event is defined as any detectable or discernable occurrence that
has significance for the management of the IT infrastructure or
the delivery of
IT service and evaluation of the impact a deviation might cause to the services may be considered
an event.
Purpose and Objectives
The purpose of Event Management is to enable stability in IT services delivery and support
by monitoring all events that occur throughout the IT infrastructure,
to allow for “normal” Service Operation and to detect and escalate exceptions.
The objectives of Event
Management are
to detect Events,
understand them
and
determine appropriate control actions.
If events are programmed to communicate operational information
as well as warnings and exceptions, they may be used as a
basis for automating many routine Operations Management activities.
Event Management also plays
a role in
understanding actual
performance
and behavior against design standards and Service Level Agreements (SLAs).
Scope
Event Management can be applied to any aspect of service management that needs to be controlled and which can be automated.
This include:
• Configuration items (CIs)
Some CIs will be included because they need to stay in a constant state
• Environmental conditions
• Software licence monitoring
Software licence monitoring for usage to ensure
optimum/legal
licence
utilization and allocation
• Security
• Normal activity
Roles
It is not common
for an organization to appoint an Event Manager.
However,
the Event Management
must ensure procedures are
coordinated within the Service
Operation functions.
Roles of Service Operation functions
The following Service Operation functions can play a role in the Event Management
process
• Service Desk:
Investigate events and ensure
appropriate action is taken for those who require attention.
• Technical and Applications Management:
Classify Events during Service Design
Test the Service during Service Transition
Analyze Events during Service Operation
• IT Operations Management:
Event Monitoring
(often
in the IT Operations
Bridge) and first-line response for Events.
Incident Management
Introduction
The Incident Management process restores disrupted services as quickly
as possible. Incident Management deals with all Incidents
including failures, questions or queries
by the users, technical staff or by Event monitoring tools.
Purposes and Objectives
Incident management aims to manage all reported Incidents.
The purposes of Incident Management are to:
• Restore normal Service Operation as quickly
as possible
• Minimize the adverse impact on business operations
• Ensure service
quality and availability are maintained
The objectives of Incident Management are to:
• Ensure that standardized methods
and procedures are used for efficient and prompt response, analysis, documentation,
ongoing
management and reporting of incidents
• Increase visibility
and communication of incidents to business and IT support staff
• Enhance business perception of IT through use of a professional approach in quickly resolving and communicating incidents when they occur
• Align incident management activities and priorities with those of the business
• Maintain user satisfaction with the quality of IT services.
Scope
The scope of Incident
Management includes the following:
• Incident
Management
includes all
incidents and any Event which could disrupt a service.
• Incident
Management also
involves
Incidents
that
are
reports by users,
technical staff and monitoring tools.
Concepts
There are some
basic
concepts
in
Incident
Management that is important
in understanding this ITIL® process:
• Timeframes must be agreed
for all Incident handling stages and captured
as targets within OLAs and UCs. All support group must know these timeframe
and Service Management tools should be automated
accordingly.
• Incident
Models
is a way of predefining
the steps that
should be taken to handle a process in an agreed upon
way. This
will ensure that
'standard' Incidents are handled in pre-defined way and within pre-defined timeframes.
An Incident Model includes the following:
Steps taken to handle the Incident
Chronological order
Responsibilities timeframes
Escalation procedures
• Major Incidents are incidents
with high
potential business impact, high urgency and
causes
that
are
known but with no
existing work-around available. For Major Incidents, a separate procedure with shorter timeframes and greater urgency must be used.
Process Activities
Process activities conducted in Incident Management are:
• Identification:
Work cannot begin on dealing with an incident
until it is known that an incident has occurred or is going to occur, Identification is therefore the first step.
• Logging:
Incidents are to be logged, dated and stamped
• Categorization:
Assigning a category for
later reporting and for determining
appropriate solution groups.
• Prioritization:
Determined by impact and urgency
• Initial Diagnoses:
The
Service Desk
carries out initial diagnosis
to
try
to discover the full symptoms
of the incident and to determine exactly what has gone wrong and how to
correct it.
Diagnostic scripts
and
known
error information can be most valuable in
allowing earlier and accurate diagnosis. If possible
the
incident
will
be
resolved in this phase,
and closed if
the resolution is successful.
• Investigation
and
Diagnoses:
Investigate
and
diagnose Incidents.
This
is either performed by the Service Desk, or (through functional escalation) by
2nd or 3rd line.
• Resolution and Recovery:
Resolution has been identified and tested, recovery
is complete, service restored and Incident
recorded and updated.
• Closure:
Performed by the
Service Desk,
to check
that
Incidents are fully
resolved and to ensure users are satisfied and agree to close the Incident.
There are certain questions that need to be answered when performing
certain activities. These questions will determine the subsequent activity required.
Prioritization
It is important to know the correct terms in prioritization
so that an Incident can be dealt with accordingly:
• Priority:
The priority is based on a combination of impact and urgency. This is often captured in a priority table.
• Impact:
Determined
by the effect upon the activities of the business. This is often measured in terms of the amount
of users affected. Impact is not about the
technical
complexity of
resolution. When
determining
impact,
Service Desk staff should take into consideration:
Risk to life or limb
The number of services affected
The level of financial losses
Effect upon business reputation
Regulatory or legislative breaches
• Urgency:
Determined
by how quickly the Incident
needs
to
be
resolved.
Related to how critical the service is for the business processes.
Escalation
Escalation takes place when the
person handling the incident lacks the
knowledge,
expertise or authority to solve the Incident.
There are two types of escalation:
• Functional Escalation:
Also called horizontal escalation and takes place due to lack of knowledge and expertise.
• Hierarchical Escalation:
Also called vertical escalation and occurs when major Incidents are reported
or when the Incident
cannot
be resolved
within an agreed timescale and possibly breach Service Level Agreements (SLAs).
The Service Desk
must ensure that SLA resolution times are not exceeded
when dealing with Incidents. They are responsible for tracking and tracing the incidents.
Escalation never turns an Incident
into a Problem, although it may result in ownership of an Incident passing to
the Problem Manager for administrative reasons and/or the identification of an associated Problem.
Interfaces
Examples of interfaces with incident management are listed below
for each service lifecycle stage.
Service Design
• Service Level Management
The ability to resolve incidents in a specified time is a key part of delivering an agreed level of service.
Incident management enables
SLM
to
define
measurable responses
to service disruptions. It also provides reports
that enable SLM to review SLAs objectively and regularly. In particular, incident management is able to assist in defining
where
services
are
at their
weakest,
so
that
SLM
can define actions as part of the service improvement plan (SIP).
SLM defines the acceptable levels
of
service within which
incident
management works, including:
Incident response times
Impact definitions
Target fix times
Service definitions, which are mapped to users
Rules for requesting services
Expectations for providing feedback to users.
• Capacity Management
Incident management
provides
a trigger
for performance monitoring where
there appears to be a
performance problem. Capacity management may develop workarounds for incidents.
• Availability Management
Availability management will use incident management data to
determine the availability of
IT services and look at where the incident lifecycle can be improved.
Service Transition
• Service Asset and Configuration Management
This process provides the data used to identify and progress incidents. One of the uses
of the
CMS is to identify faulty
equipment and to assess the impact of
an incident. The CMS also contains information about
which categories
of incident should
be assigned to which support
group. In turn, incident management can maintain the status of faulty CIs. It can also assist service asset and configuration management to audit the infrastructure when working to resolve an incident.
• Change management
Where a change is required to implement a workaround or resolution, this will need to be logged as an RFC and progressed through change
management. In turn, incident management
is able to detect and resolve incidents that arise from failed changes.
Service Operation
• Problem Management
For some incidents, it will
be appropriate to involve problem management to investigate and resolve the underlying cause to prevent or reduce the impact of recurrence.
Incident management
provides a point where these are reported. Problem management, in return, can provide known errors for faster incident resolution through workarounds that can be used to restore service.
• Access Management
Incidents should be raised when unauthorized access attempts and security
breaches have been
detected. A history of incidents should also be maintained to support forensic investigation activities and resolution of access
breaches.
Metrics
The Incident Manager
should prepare reports that can assist to judge the efficiency and effectiveness of the Incident Management
Process.
Key Performance Indicators (KPIs):
• Percentage of Incidents handled within a timescale
• Percentage of Incidents assigned correctly
• Percentage of Incidents resolved
by the Service Desk
• Number of Incidents processed per agent
Challenges
Challenges in Incident
Management are:
• Detect Incidents as early as possible
• Convince all staff to log all Incidents
• Have a good understanding of SLAs
Roles
Roles in Incident Management are:
• Incident Manager:
Responsible for
producing, managing, maintaining, monitoring and developing Incident
Management process
and systems
• Service Desk:
Tasks undertaken by the Service Desk includes the handling of first-line Incidents and act as the SPOC (Single Point of Contact) for IT users on a daily basis. The Service Desk must also manage communications
with end-users.
• 2nd, 3rd, nth lines: Consists of specialists
who handle
escalated Incidents or
Incidents that involve third parties.
ITIL, ITIL Foundation Course, ITIL V3, ITIL Course, ITIL - Course, online itil, itil certification, online material for itil course