SAID for CFC

Background

A large number of forwarding failures occur on the network and cannot recover automatically. As a result, services are interrupted and cannot be automatically restored for a long time. A mechanism is required to detect forwarding failures that cannot recover automatically. After a forwarding entry (such as route forwarding entry and ARP forwarding entry) failure is detected, proper measures are taken to rectify the fault quickly.

Definition

The control plane with forwarding plane consistency check (CFC) service node is a specific service node in the SAID framework. The CFC node selects some typical routes and compares the outbound interface, MAC address, and label encapsulation information on the control plane with those on the forwarding plane. If the information is inconsistent, the system enters the diagnosis state and performs the consistency check for multiple times. If the comparison result remains, an alarm is generated.

Principles

The SAID system diagnoses the CFC service node through three phases: flow selection, check, and troubleshooting. In this case, devices can perform automatic diagnosis, collect fault information automatically, and generate alarms.

  • Flow selection

    There are a large number of routes on the live network. The system selects typical routes for the check.

    Routes are selected based on the following priorities. Default route > 32-bit direct route > Static route > Private routes > Others

    The total number of 4000 flows can be selected, and the quota of each type of flow is limited. The system delivers a flow selection task based on the standard quota of each type of flow. If the quota of a type of flow is not used up, the extra quota is used for other types of flows after summarizing the results.

  • Check

    After summarizing the flow selection results of interface boards and obtaining the final flow set to be checked, the main control board broadcasts the flow selection information to each interface board. The interface boards start to check the flows.

    Data on the control plane is inconsistent with that on the forwarding plane in the following situations:
    1. The forwarding plane has the outbound interface, MAC address, and label encapsulation information, but the control plane does not.
    2. Data on the forwarding plane is incorrect (for example, an entry is invalid), and no hardware forwarding result is obtained. If the outbound interface, MAC address, and label encapsulation information can be obtained, the data compared with that on the control plane. In normal cases, the data on the forwarding plane is the same as or is a subset of that on the control plane.
  • Troubleshooting

    After a fault occurs, the context information related to the fault is collected. Then, the device enters the diagnosis state and repeatedly checks the incorrect flow. If an entry error occurs for three consecutive times, the device enters the recovery state. If no error occurs once, the flow is considered normal and no further diagnosis is required.

    After the fault is diagnosed, you can run commands to restart the interface to rectify the fault.

    After the fault recovery action is performed, the current flow needs to be checked again after it keeps stable and does not change for 5 minutes. If the fault persists, an alarm is generated and the context information related to the fault is collected. If the fault is rectified, the system enters the detection state again and continues to check the subsequent flows.

    After an alarm is generated, the SAID system keeps checking the current flow until the flow is correct. Then, the alarm is cleared and the system enters the detection state.

Copyright © Huawei Technologies Co., Ltd.
Copyright © Huawei Technologies Co., Ltd.
< Previous topic Next topic >