A large number of forwarding failures occur on the network and cannot recover automatically. As a result, services are interrupted and cannot be automatically restored for a long time. A mechanism is required to detect forwarding failures that cannot recover automatically. After a forwarding entry (such as route forwarding entry and ARP forwarding entry) failure is detected, proper measures are taken to rectify the fault quickly.
The control plane with forwarding plane consistency check (CFC) service node is a specific service node in the SAID framework. The CFC node selects some typical routes and compares the outbound interface, MAC address, and label encapsulation information on the control plane with those on the forwarding plane. If the information is inconsistent, the system enters the diagnosis state and performs the consistency check for multiple times. If the comparison result remains, an alarm is generated.
The SAID system diagnoses the CFC service node through three phases: flow selection, check, and troubleshooting. In this case, devices can perform automatic diagnosis, collect fault information automatically, and generate alarms.
Flow selection
There are a large number of routes on the live network. The system selects typical routes for the check.
Routes are selected based on the following priorities.
The total number of 4000 flows can be selected, and the quota of each type of flow is limited. The system delivers a flow selection task based on the standard quota of each type of flow. If the quota of a type of flow is not used up, the extra quota is used for other types of flows after summarizing the results.
Check
After summarizing the flow selection results of interface boards and obtaining the final flow set to be checked, the main control board broadcasts the flow selection information to each interface board. The interface boards start to check the flows.
Troubleshooting
After a fault occurs, the context information related to the fault is collected. Then, the device enters the diagnosis state and repeatedly checks the incorrect flow. If an entry error occurs for three consecutive times, the device enters the recovery state. If no error occurs once, the flow is considered normal and no further diagnosis is required.
After the fault is diagnosed, you can run commands to restart the interface to rectify the fault.
After the fault recovery action is performed, the current flow needs to be checked again after it keeps stable and does not change for 5 minutes. If the fault persists, an alarm is generated and the context information related to the fault is collected. If the fault is rectified, the system enters the detection state again and continues to check the subsequent flows.
After an alarm is generated, the SAID system keeps checking the current flow until the flow is correct. Then, the alarm is cleared and the system enters the detection state.