FEI_1.3.6.1.4.1.2011.5.25.157.2.222 hwLinkHeartbeatDropAlarm

Trap Buffer Description

The link-heartbeat function detected that the packet loss rate reached or exceeded the threshold. (InterfaceIfIndex=[InterfaceIfIndex], InterfaceName=[InterfaceName], SendInterfaceName=[SendInterfaceName], RecvInterfaceName=[RecvInterfaceName])

The link-heartbeat function detected that the packet loss rate reached or exceeded the threshold.

Trap Attributes

Trap Attribute Description

Alarm or Event

Alarm

Trap Severity

Critical

Mnemonic Code

hwLinkHeartbeatDropAlarm

Trap OID

1.3.6.1.4.1.2011.5.25.157.2.222

MIB

HUAWEI-PORT-MIB

Alarm ID

0x09ae2002

Alarm Name

hwLinkHeartbeatDropAlarm

Alarm Type

qualityOfServiceAlarm

Raise or Clear

Raise

Match trap

FEI_1.3.6.1.4.1.2011.5.25.157.2.223 hwLinkHeartbeatDropAlarmResume

Trap Buffer Parameters

Parameter Description

InterfaceIfIndex

Interface IfIndex.

InterfaceName

Interface Name.

SendInterfaceName

Send Packets Interface Name.

RecvInterfaceName

Recvive Packets Interface Name.

VB Parameters

VB OID VB Name VB Index

1.3.6.1.4.1.2011.5.25.157.1.24.1.1.1

hwLinkHeartbeatIfindex

hwLinkHeartbeatIfindex

1.3.6.1.4.1.2011.5.25.157.1.24.1.1.2

hwLinkHeartbeatIfName

hwLinkHeartbeatIfindex

1.3.6.1.4.1.2011.5.25.157.1.24.1.1.3

hwLinkHeartbeatTxInterface

hwLinkHeartbeatIfindex

1.3.6.1.4.1.2011.5.25.157.1.24.1.1.4

hwLinkHeartbeatRxInterface

hwLinkHeartbeatIfindex

Impact on the System

Fault-triggered packet loss occurs on the link with link-heartbeat detection enabled, which may affect service forwarding.

Possible Causes

Fault detection

The ping service node performs link-heartbeat loopback detection to detect service faults. The packets used are ICMP detection packets. There are 12 packet templates in total. Each template sends two packets in sequence within a period of 30s. Therefore, a total of 24 packets are sent by the 12 templates within a period of 30s. After five periods, the system starts to collect statistics on lost packets and modified packets.

Link-heartbeat loopback detection is classified as packet modification detection or packet loss detection.

Packet loss detection checks whether the difference between the number of received heartbeat packets and the number of sent heartbeat packets is within the permitted range. If one of the following conditions is met, a trigger message is sent to instruct the SAID ping node to perform fault diagnosis:

1:The total number of lost packets exceeds 3.

2:After each packet sending period ends, the system checks the protocol status and whether ARP entries exist on the interface and find that there is no ARP in three consecutive periods.

3:The absolute value of the difference between the number of lost packets whose payload is all 0s and the number of lost packets whose payload is all Fs is greater than 25% of the total number of sent packets in five periods.

Fault diagnosis

After receiving the triggered message in the fault detection state, the ping service node enters the fault diagnosis state.

1:If a packet loss error is detected on the device, the SAID ping node checks whether a module (subcard, TM, or NP) on the device is faulty. If no module is faulty, the system completes the diagnosis and returns to the fault detection state.

2:If a packet loss error is detected on the device, the SAID ping node checks whether a module (subcard, TM, or NP) on the device is faulty. If a module fault occurs, the system performs loopback diagnosis. If packet loss or modification is detected during loopback, the local device is faulty. The system then enters the fault recovery state. If no packet is lost during loopback diagnosis, the system returns to the fault detection state.

Fault recovery

If a fault is detected during loopback diagnosis, the ping service node determines whether a counting error occurs on the associated subcard.

1:If a counting error occurs on the subcard, the ping service node resets the subcard for service recovery. Then, the node enters the service recovery determination state and performs link-heartbeat loopback detection to determine whether services recover. If services recover, the node returns to the fault detection state. If services do not recover, the node returns to the fault recovery state and takes a secondary recovery action. (For a subcard reset, the secondary recovery action is board reset.)

2:If no counting error occurs on the subcard, the ping service node resets the involved board for service recovery. After the board starts, the node enters the service recovery determination state and performs link-heartbeat loopback detection to determine whether services recover. If services recover, the node returns to the fault detection state. If services do not recover, the node remains in the service recovery determination state and periodically performs link-heartbeat loopback detection until services recover.

Service recovery determination

After fault recovery is complete, the ping service node uses the fault packet template to send diagnostic packets. If a fault still exists and a subcard reset is performed, the node generates an alarm and instructs the subcard to perform a switching for self-healing. If a fault still exists but no subcard reset is performed, the node generates an alarm only. If no fault exists, the node instructs the link-heartbeat loopback function to return to the initiate state, and the node itself returns to the fault detection state.

Fault alarm

If link-heartbeat loopback detects packet loss, it triggers SAID ping diagnosis and performs recovery operations (reset the subcard or board). However, services fail to be recovered, and the device detects packet loss and reports an alarm.

Procedure

1.Run the display link-heartbeat command to check information about link-heartbeat packet loss on the interface. Check whether link-heartbeat packets are lost for five consecutive intervals (150s).

  • If no link-heartbeat packets are lost for five consecutive intervals (150s), the trap is cleared, go to Step 3.
  • If link-heartbeat packets are lost for five consecutive intervals (150s), the trap is not cleared, go to Step 2.

2.Collect the alarm information, log information, and configuration information, and then contact technical support personel.

3.End.

Copyright © Huawei Technologies Co., Ltd.
Copyright © Huawei Technologies Co., Ltd.
< Previous topic Next topic >