OSPF Neighbor Relationship Flapping Suppression

OSPF neighbor relationship flapping suppression works by delaying OSPF neighbor relationship reestablishment or setting the link cost to the maximum value.

Background

If an interface carrying OSPF services alternates between Up and Down, OSPF neighbor relationship flapping occurs on the interface. During the flapping, OSPF frequently sends Hello packets to reestablish the neighbor relationship, synchronizes LSDBs, and recalculates routes. In this process, a large number of packets are exchanged, adversely affecting neighbor relationship stability, OSPF services, and other OSPF-dependent services, such as LDP and BGP. OSPF neighbor relationship flapping suppression can address this problem by delaying OSPF neighbor relationship reestablishment or preventing service traffic from passing through flapping links.

Related Concepts

Flapping-event: reported when the status of a neighbor relationship on an interface last changes from Full to a non-Full state. The flapping-event triggers flapping detection.

Flapping-count: number of times flapping has occurred.

Detecting-interval: detection interval. The interval is used to determine whether to trigger a valid flapping_event.

Threshold: flapping suppression threshold. When the flapping_count reaches or exceeds threshold, flapping suppression takes effect.

Resume-interval: interval for exiting from OSPF neighbor relationship flapping suppression. If the interval between two successive valid flapping_events is longer than resume-interval, the flapping_count is reset.

Implementation

Flapping detection

Each OSPF interface on which OSPF neighbor relationship flapping suppression is enabled starts a flapping counter. If the interval between two successive neighbor status changes from Full to a non-Full state is shorter than detecting-interval, a valid flapping_event is recorded, and the flapping_count increases by 1. When the flapping_count reaches or exceeds threshold, flapping suppression takes effect. If the interval between two successive neighbor status changes from Full to a non-Full state is longer than resume-interval, the flapping_count is reset.

The detecting-interval, threshold, and resume-interval are configurable.

The value of resume-interval must be greater than that of detecting-interval.

Flapping suppression

Flapping suppression works in either Hold-down or Hold-max-cost mode.

  • Hold-down mode: In the case of frequent flooding and topology changes during neighbor relationship establishment, interfaces prevent neighbor relationship reestablishment during Hold-down suppression, which minimizes LSDB synchronization attempts and packet exchanges.
  • Hold-max-cost mode: If the traffic forwarding path changes frequently, interfaces use 65535 as the cost of the flapping link during Hold-max-cost suppression, which prevents traffic from passing through the flapping link.

Flapping suppression can also work first in Hold-down mode and then in Hold-max-cost mode.

By default, the Hold-max-cost mode takes effect. The mode and suppression duration can be changed manually.

If an attack causes frequent neighbor relationship flapping, Hold-down mode can minimize the impact of the attack.

When an interface enters the flapping suppression state, all neighbor relationships on the interface enter the state accordingly.

Exiting from flapping suppression

Interfaces exit from flapping suppression in the following scenarios:

  • The suppression timer expires.
  • The corresponding OSPF process is reset.
  • An OSPF neighbor is reset.
  • A command is run to exit from flapping suppression.

Typical Scenarios

Basic scenario

In Figure 1, the traffic forwarding path is Device A -> Device B -> Device C -> Device E before a link failure occurs. After the link between Device B and Device C fails, the forwarding path switches to Device A -> Device B -> Device D -> Device E. If the neighbor relationship between Device B and Device C frequently flaps at the early stage of the path switchover, the forwarding path will be switched frequently, causing traffic loss and affecting network stability. If the neighbor relationship flapping meets suppression conditions, flapping suppression takes effect.

  • If flapping suppression works in Hold-down mode, the neighbor relationship between Device B and Device C is prevented from being reestablished during the suppression period, in which traffic is forwarded along the path Device A -> Device B -> Device D -> Device E.
  • If flapping suppression works in Hold-max-cost mode, 65535 is used as the cost of the link between Device B and Device C during the suppression period, and traffic is forwarded along the path Device A -> Device B -> Device D -> Device E.
Figure 1 Flapping suppression in a basic scenario

Single-forwarding path scenario

When only one forwarding path exists on the network, the flapping of the neighbor relationship between any two devices on the path will interrupt traffic forwarding. In Figure 2, the only traffic forwarding path is Device A -> Device B -> Device C -> Device E. If the neighbor relationship between Device B and Device C flaps, and the flapping meets suppression conditions, flapping suppression takes effect. However, if the neighbor relationship between Device B and Device C is prevented from being reestablished, the whole network will be divided. Therefore, Hold-max-cost mode (rather than Hold-down mode) is recommended. If flapping suppression works in Hold-max-cost mode, 65535 is used as the cost of the link between Device B and Device C during the suppression period. After the network stabilizes and the suppression timer expires, the link is restored.

By default, the Hold-max-cost mode takes effect.

Figure 2 Flapping suppression in a single-forwarding path scenario

Broadcast scenario

As shown in Figure 3, four devices are connected to the same broadcast network through switches, and the devices are broadcast network neighbors. If Device C flaps due to a link failure, and Device A and Device B were deployed at different time (Device A was deployed earlier for example) or the flapping suppression parameters on Device A and Device B are different, Device A first detects the flapping and suppresses Device C. Consequently, the Hello packets sent by Device A do not carry Device C's router ID. However, Device B has not detected the flapping yet and still considers Device C a valid node. As a result, the DR candidates identified by Device A are Device B and Device D, whereas the DR candidates identified by Device B are Device A, Device C, and Device D. Different DR candidates result in a different DR election result, which may lead to route calculation errors. To prevent this problem in scenarios where an interface has multiple neighbors, such as on a broadcast, P2MP, or NBMA network, all neighbors on the interface are suppressed when the status of a neighbor relationship last changes to ExStart or Down. Specifically, if Device C flaps, Device A, Device B, and Device D on the broadcast network are all suppressed. After the network stabilizes and the suppression timer expires, Device A, Device B, and Device D are restored to normal status.

Figure 3 Flapping suppression on a broadcast network

Multi-area scenario

In Figure 4, Device A, Device B, Device C, Device E, and Device F are connected in area 1, and Device B, Device D, and Device E are connected in backbone area 0. Traffic from Device A to Device F is preferentially forwarded along an intra-area route, and the forwarding path is Device A -> Device B -> Device C -> Device E -> Device F. When the neighbor relationship between Device B and Device C flaps and the flapping meets suppression conditions, flapping suppression takes effect in the default mode (Hold-max-cost). However, the forwarding path remains unchanged (Device A -> Device B -> Device C -> Device E -> Device F) after the neighbor flapping occurs because intra-area routes take precedence over inter-area routes during route selection regardless of costs according to OSPF route selection rules. The Hold-max-cost mode cannot suppress traffic path switching in this case. To prevent traffic loss in multi-area scenarios, configure Hold-down mode to prevent the neighbor relationship between Device B and Device C from being reestablished during the suppression period. During this period, traffic is forwarded along the path Device A -> Device B -> Device D -> Device E -> Device F.

By default, the Hold-max-cost mode takes effect in OSPF. The mode can be changed to Hold-down manually.

Figure 4 Flapping suppression in a multi-area scenario

Scenario with both LDP-IGP synchronization and neighbor relationship flapping suppression configured

In Figure 5, if the link between PE1 and P1 fails, an LDP LSP switchover is implemented immediately, causing the original LDP LSP to be deleted before a new LDP LSP is established. To prevent traffic loss, LDP-IGP synchronization needs to be configured. With LDP-IGP synchronization, 65535 is used as the cost of the new LSP to be established. After the new LSP is established, the original cost takes effect. Consequently, the original LSP is deleted, and LDP traffic is forwarded along the new LSP.

LDP-IGP synchronization and neighbor relationship flapping suppression work in either Hold-down or Hold-max-cost mode. If both LDP-IGP synchronization and neighbor relationship flapping suppression are configured, Hold-down mode takes precedence over Hold-max-cost mode, followed by the configured link cost. The service type is not used as the conditions for the rule to take effect. Instead, the final effective state (hold-down, hold-max-cost, or exit) is used as the condition. For details, see Table 1.

Table 1 Principles for selecting the suppression modes that take effect in different situations

LDP-IGP Synchronization/OSPF Neighbor Relationship Flapping Suppression Mode

LDP-IGP Synchronization Hold-down Mode

LDP-IGP Synchronization Hold-max-cost Mode

Exited from LDP-IGP Synchronization Suppression

OSPF Neighbor Relationship Flapping Suppression Hold-down Mode

Hold-down

Hold-down

Hold-down

OSPF Neighbor Relationship Flapping Suppression Hold-max-cost Mode

Hold-down

Hold-max-cost

Hold-max-cost

Exited from OSPF Neighbor Relationship Flapping Suppression

Hold-down

Hold-max-cost

Exited from LDP-IGP synchronization and neighbor relationship flapping suppression

For example, the link between PE1 and P1 frequently flaps in Figure 5. Both LDP-IGP synchronization and neighbor relationship flapping suppression are configured. In this case, the suppression mode is selected based on the preceding principles. That is, the neighbor relationship between PE1 and P1 cannot be established within a period of time or the cost of the link between PE1 and P1 is set to the maximum value (65535). In this way, service traffic is switched to the PE1 -> P4 -> P3 -> PE2 path.

Figure 5 Scenario with both LDP-IGP synchronization and neighbor relationship flapping suppression configured

Scenario with both bit-error-triggered protection switching and neighbor relationship flapping suppression configured

Bit-error-triggered protection switching is used to protect link quality. If the link quality is poor and the BER is high, a bit error event is reported. User services carried on the link with a high BER may be affected. Therefore, user traffic needs to be switched to other links. After receiving the bit error event, the OSPF module adjusts the interface cost to the maximum value (65535), recalculates the route, and re-selects a route to switch service traffic to the backup link. If both bit-error-triggered protection switching and neighbor relationship flapping suppression are configured, they both take effect. Hold-down mode takes precedence over Hold-max-cost mode, followed by the configured link cost.

Copyright © Huawei Technologies Co., Ltd.
Copyright © Huawei Technologies Co., Ltd.
< Previous topic Next topic >