Internet-Draft Incident Terminology January 2024
Davis & Farrel Expires 21 July 2024 [Page]
Workgroup:
Network Working Group
Internet-Draft:
draft-davis-nmop-incident-terminology-00
Published:
Intended Status:
Informational
Expires:
Authors:
N. Davis
Ciena
A. Farrel
Old Dog Consulting

Some Key Terms for Incident Management

Abstract

This document sets out some key terms that are fundamental to a common understanding of Incident Management.

The purpose of this document is to bring clarity to discussions and other work related to Incident Management in particular YANG models and management protocols that report, make visible, or manage incidents.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 21 July 2024.

Table of Contents

1. Introduction

Incident Management is an important aspect of network management and control solutions. It deals with the reporting, inspection, correlation, and management of events within the network where those events have a negative effect on the network's ability to forward traffic in an optimal way. Incident management extends to include actions taken that work toward recovery of optimal network behavior.

A number of work efforts within the IETF seek to provide components of an Incident Management system, such as YANG models or management protocols. It is important that a common terminology is used so that there is a clear understanding of how the elements of the management and control solutions fit together, and how the incidents will be handled.

This document sets out some key terms that are fundamental to a common understanding of Incident Management.

2. Terminology

The terms are presented below in an order that is intended to flow such that it is possible to gain understanding reading top to bottom.

Resource:

A component or commodity that can be used in a valuable way in the performance of some activity.

State:

A particular condition that something is in (at a specific time).

Change:

A modification to the state of a resource in time.

  • Most changes are not noteworthy (and are not relevant).

  • Perception of change depends upon the sampling rate/accuracy/detail and perspective.

Occurrence:

A particular relevant change.

  • The change is potentially without a plan or intent.

  • An occurrence may be an aggregation or abstraction of smaller occurrences.

  • Applies to all scales and scopes, i.e., is essentially fractal (can recurse indefinitely).

  • Note that occurrence is used here with respect to the temporal dimension.

Event:

The state modification in an occurrence.

  • Compared with a change which is over a period of time, an event happens at a measurable instant.

Incident:

An event that has a negative effect that is not as required/desired.

Problem:

A state regarded as undesirable that needs to be dealt with and overcome.

  • There is a need to change to a desirable/appropriate state.

  • Note that there is a historic aspect to this. The current state may be operational, but there was a failure that is unexplained and therefore the network is in a state of unexplained recent failure which, although the network has recovered, is a problem.

  • Note that whilst a problem is unresolved it requires attention. A record of a resolved problem may be maintained in a log of history.

  • Note that the network may be in a state which is considered to be a problem from several perspectives (e.g., there is loss of light causing services to fail). A state change (so that the light recovers) may cause the problem to be resolved from one perspective (the services have are now operational) but may still leave the problem as unresolved from another perspective (because the loss of light has not been explained). There can be further developments (the reason for the temporary loss of light is traced to a microbend in the fiber that is repaired) that cause another problem to be resolved. But this leaves a final problem still unresolved (why did the microbend occur in the first place?).

Alert:

The indication of the potential existence of a problem

Notification:

Communication of a state change.

  • May be an alert.

Alarm:

An indication to a human operator highlighting the potential presence of a problem.

  • The alarm state change is an event.

Transient:

A state, considered as a problem, that persists for a limited amount of time before becoming resolved without direct action by an operator or control system.

Intermittent:

A state that is not maintained, but keeps occurring in some meaningfully short time frame.

Cause:

The activity, event, etc. that gives rise to an (undesired) event, condition, or behavior.

Detect:

To notice the presence of something (state, activity, form, etc.).

  • Hence also to notice a change (from the perspective of the viewer).

Condition:

The state of something with regard to its working order.

  • Here, this term is used where the state is an issue with operation. For example, "signal degraded" is a condition that indicates an issue with the operation.

3. Security Considerations

This document specifies terminology and has no direct effect on the security of implementations or deployments. However, protocol solutions and management models need to be aware of several aspects:

4. Privacy Considerations

In general, Incident Management will not expose information about end-user activities or user data. The main privacy concern is for a network operator to keep control of all information about incidents to protect their privacy and the details of how they operate their network.

5. IANA Considerations

This document makes no requests for IANA action.

Authors' Addresses

Nigel Davis
Ciena
United Kingdom
Adrian Farrel
Old Dog Consulting
United Kingdom