Overview
The goal of ITSM Problem Management is to resolve the root cause(s) of Incidents by minimizing the adverse impact of Incidents and Problems within the IT infrastructure, preventing the recurrence of Incidents, and recommending proactive measures to prevent future Incidents.
A Problem is an unknown underlying cause of one or more incidents, and a 'known error' is a problem that is successfully diagnosed and for which a workaround has been identified. The Central Computing and Telecommunication Agency (CCTA) defines problems and known errors as a condition often identified as a result of multiple Incidents that exhibit common symptoms. Problems can also be identified from a single significant Incident, indicative of a single error, for which the cause is unknown, but for which the impact is significant.
Problem management is different from Incident management. The principal purpose of problem management is the detection, resolution, and prevention of incidents; incident management records the incident.
How can I get assistance?
Anyone with a valid VUMC ID and password can initiate a communication about a problem, also called an Incident Ticket, in Pegasus. If you believe a problem exists as defined above, it is suggested that you call the VUMC IT/NTT Help Desk immediately at 615-343-HELP/3-4357 and ask that an email be sent to the Problem Manager to discuss the issue and gather as much initial information as possible.
If a true problem exists, an initial email helps everyone understand expectations and if Problem Management is the appropriate course of action. The Problem Manager may provide a blank template for the customer/requestor to complete and send back. This template provides basic information that is needed to determine if an Incident exists and begins the discovery process.
-
Design Analysis
Design Analysis is an aspect of Proactive Problem Management. It uses reviews of proposed or existing system or application designs to identify errors or deficiencies that could potentially lead to errors and Problems in the current environment.
Known Error
A Known Error is a Problem for which the root cause has been identified, but no permanent fix has been implemented. It remains a Known Error until it is permanently fixed by a Change.
Proactive Problem Management
Proactive Problem Management identifies and solves Problems and Known Errors before Incidents occur and customers are negatively impacted. Proactive Problem Management seeks to eliminate the causes of incidents or problems that have not yet occurred in order to prevent them from ever occurring. Proactive actions can include:
- Gathering feedback from IT customers
- Capacity and performance monitoring
- Trend analysis of Incidents and Events
- Design analysis of existing services
Problem
A Problem is an unknown underlying cause of multiple, persistent, or Major Incidents, including observed or monitored Events that may lead to Major Incidents.
Reactive Problem Management
Reactive Problem Management seeks to eliminate the causes of problems that have already occurred in order to prevent the future recurrence of related Incidents. The reactive aspect is concerned with solving Problems in response to one or more Events or Incidents.
Root Cause
The underlying or original cause(s) of Events or Incidents.
Trend Analysis
Trend Analysis is an aspect of Proactive Problem Management. It uses available data and tools in order to identify and report on operational trends that may indicate the existence of an error or a developing Problem in the environment. These tools and data sources may include:
- System monitoring tools
- Event Management data
- Incident/Service Desk data
- Any other available and useful information
Workaround
A Workaround (sometimes called a temporary fix) is a sanctioned fix or technique to mitigate the negative impacts on the customer(s) relating to an Incident or Problem. Workarounds must be documented in the Knowledge Management System.