Bidirectional Forwarding Detection (BFD) detects communication faults between forwarding engines. BFD monitors the connectivity of a data protocol on a path between two systems. This path can be a tunnel, a physical link, or a logical link.
BFD for OSPF enables BFD sessions to be associated with OSPF. If a BFD session detects a link fault, it notifies OSPF of the fault, allowing OSPF to quickly respond to the change in the network topology.
Devices re-calculate routes in the event of a link fault or a topology change. Network performance is improved if the route convergence process of routing protocols is completed faster.
With BFD associated with OSPF, BFD can speed up OSPF route convergence if a fault occurs between neighbors.
BFD Association |
Link Fault Detection Condition |
Convergence Speed |
---|---|---|
No |
The OSPF dead timer expires. (The default timeout interval is 40 seconds.) |
Measured in seconds |
Yes |
A BFD session goes Down. |
Measured in milliseconds |
Figure 1 shows the process of BFD for OSPF.
The process of BFD for OSPF is as follows:
OSPF neighbor relationships are established between SwitchA, SwitchB, and SwitchC.
A full neighbor relationship triggers BFD, which then establishes a BFD session.
The outbound interface of the route from SwitchA to SwitchB is GE0/0/2. If the link between SwitchA and SwitchB fails, BFD detects the fault and then notifies SwitchA of the fault.
SwitchA processes the Down neighbor relationship event, and then re-calculates routes. Following re-calculation, the outbound interface of the route from SwitchA to SwitchB becomes GE0/0/1, with the route to SwitchB traversing SwitchC.
A device restart, or even a device itself, may cause the loss of network traffic during BGP convergence. This happens because OSPF converges faster than BGP.
The solution to this problem is association between OSPF and BGP.
If there is a backup link, BGP traffic is lost during traffic switchback because BGP route convergence is slower than OSPF route convergence.
In Figure 2, SwitchA, SwitchB, SwitchC, and SwitchD are running OSPF and have established IBGP connections. SwitchC functions as the backup of SwitchB. When the network is stable, BGP and OSPF routes converge completely on the devices.
Traffic from SwitchA to 10.3.1.0/30 normally passes through SwitchB. If SwitchB becomes faulty, traffic is switched to SwitchC. After SwitchB recovers, traffic is switched back to SwitchB.
During the switchback process, some packets are lost because OSPF route convergence is faster than BGP route convergence. This means that the convergence of OSPF routes is already complete while BGP routes are still converging. As a result, SwitchB fails to learn the route to 10.3.1.0/30. Therefore, upon receipt of packets from SwitchA to 10.3.1.0/30, SwitchB discards these packets.
A device with OSPF-BGP association enabled remains a stub router within the set association period. The link metric in the LSAs advertised by the device is set to the maximum value of 65535. In this way, the device instructs other OSPF devices not to use it for data forwarding.
In Figure 2, OSPF-BGP association is enabled on SwitchB. SwitchA continues to use the backup device SwitchC for data forwarding until BGP route convergence on SwitchB is complete.
On networks that use primary and backup links, OSPF-LDP association ensures that, when a faulty primary link recovers, traffic interruptions are minimized.
As shown in Figure 3, the primary link travels along the path PE1→P1→P2→P3→PE2, and the backup link travels along the path PE1→P1→P4→P3→PE2.
When the primary link is faulty, traffic is switched to the backup link. After the primary link recovers, traffic is switched back to the primary link, during which traffic is interrupted for an extended period of time.
Synchronizing LDP and OSPF (IGP) on P1 and P2 can shorten the duration of traffic interruption caused by traffic switchback to the primary link from seconds to milliseconds.
OSPF-LDP association delays route switchback by suppressing the establishment of OSPF neighbor relationships until LDP convergence is complete. Before an LSP is established on the primary link, the backup link continues to forward traffic. The backup link is then deleted following the LSP establishment on the primary link.
OSPF-LDP association involves three timers:
Hold-down
Hold-max-cost
Delay
When the primary link recovers, a router responds as follows:
Starts the Hold-down timer. The OSPF interface does not establish OSPF neighbor relationships but waits for the establishment of an LDP session. The hold-down timer specifies the period that the OSPF interface waits.
Starts the Hold-max-cost timer after the Hold-down timer expires. The Hold-max-cost timer specifies the interval for advertising the maximum link metric of the interface in LSAs to the primary link.
Starts the Delay timer to wait for the establishment of an LSP after an LDP session is re-established on the primary link.
Enables LDP to notify OSPF that synchronization is complete regardless of the OSPF status after the Delay timer expires.