Reliability is a technology that can shorten traffic interruption time and ensure the quality of service on a network, improving user experience.
Device reliability can be assessed from the following aspects: system, hardware, and software reliability design; reliability test and verification; IP network reliability design.
As networks rapidly develop and applications become diversified, various value-added services (VASs) are widely used. The requirement for network bandwidth increases dramatically. Any network service interruption will result in immeasurable loss to carriers.
Demands for network infrastructure reliability are increasing.
This chapter describes IP reliability technologies supported by the NetEngine 8000 F.
Reliability indexes include the mean time to repair (MTTR), mean time between failures (MTBF), and availability.
Generally, product or system reliability is assessed based on the MTTR and MTBF.
The MTTR is calculated using the following formula:
MTTR = Fault detection time + Board replacement time + System initialization time + Link recovery time + Route convergence time + Forwarding recovery time
A smaller addend indicates a shorter MTTR and higher device availability.
Availability is calculated using the following formula:
Availability = MTBF/(MTBF + MTTR)
In the telecom industry, 99.999% availability means that service interruptions caused by device failures are less than 5 minutes each year.
On live networks, network faults and service interruptions are inevitable due to various causes. Availability can be improved by decreasing the MTTR.
Reliability requirements at different levels differ in the target and implementation.
Table 1 describes three reliability requirement levels and their targets and implementations.
Level |
Target |
Implementation |
---|---|---|
1 |
Few faults in system software and hardware |
|
2 |
No impact on the system if a fault occurs |
Redundancy design, switchover policy, and switchover success rate improvement |
3 |
Rapid recovery if a fault occurs and affects the system |
Fault detection, diagnosis, isolation, and rectification |
Networking principles for highly reliable IP networks include hierarchical networking, redundancy, and load balancing.
The details are as follows:
Hierarchical networking: A network is divided into three layers: core layer, convergence layer, and edge layer. According to service status or prediction, redundancy backup is configured so that a customer edge device is dual-homed to the devices at the convergence layer. The devices at the convergence layer are dual-homed to multiple devices in a single node or different nodes at the upper layer. The devices at the core and convergence layers can be deployed as required. The devices at the core layer are fully or half interconnected. Two devices are reachable to each other using one route at a fast traffic rate, avoiding multi-interconnection.
Multi-interconnection is preferred at the same layer, whereas multi-device is preferred in a single node.
A lower-layer device is dual- or multi-homed to multiple devices in a single node or different nodes.
Adjustments can be made based on the actual traffic volume.