SAID for SEU

Background

As the manufacturing technique of electronic components evolves towards deep submicron, the per-unit soft failure rate of storage units in such components has been increasing. As a result, single event upset (SEU) faults often occur, adversely affecting services.

Definition

If a subcard encounters an SEU fault, SAID for SEU performs loopbacks on all interfaces of the subcard. If packet loss or modification occurs during loopback detection, the subcard is reset for fault rectification.

Principles

The SAID system diagnoses an SEU fault through three phases: fault detection, loopback detection, and troubleshooting. This enables devices to perform automatic diagnosis and fault information collection.

  • Fault detection

    SAID for SEU detects an SEU fault on a logical subcard and starts loopback detection.

  • Loopback detection

    Loopback detection is to send ICMP packets from the CPU on the involved interface board to an interface on the faulty subcard and then loop back the ICMP packets from the interface to the CPU.

  • Troubleshooting

    1. If packet loss or modification occurs, SAID for SEU performs either of the following operations depending on the status of the involved interface:
      1. If the interface is physically Up, SAID for SEU resets the subcard.
      2. If the interface is physically Down, SAID for SEU keeps the interface Down until the fault is rectified.
    2. If statistics about the sent and received loopback packets are properly collected and packet verification is normal, the subcard does not need to be reset.
Copyright © Huawei Technologies Co., Ltd.
Copyright © Huawei Technologies Co., Ltd.
< Previous topic