PIM-SM

PIM-SM implements P2MP data transmission on large-scale networks on which multicast data receivers are sparsely distributed. PIM-SM forwards multicast data only to network segments with receivers that have required the data.

PIM-SM assumes that no host wants to receive multicast data. Therefore, PIM-SM sets up an MDT only after a host requests multicast data, and then sends the data to the host along the MDT.

Concepts

Basic PIM-SM concepts are described based on the networking shown in Figure 1.

Figure 1 PIM-SM network

PIM device

A router that runs PIM is called a PIM device. A router interface on which PIM is enabled is called a PIM interface.
PIM domain

A network constructed by PIM devices is called a PIM network.

A PIM-SM network can be divided into multiple PIM-SM domains by configuring BSR boundaries on router interfaces to restrict BSR message transmission. PIM-SM domains isolate multicast traffic between domains and facilitate network management.
DR
A designated router (DR) can be a multicast source's DR or a receiver's DR.
- In PIM-SM, a multicast source's DR is a PIM device directly connected to a multicast source and is responsible for sending Register messages to a Rendezvous Point (RP).
- A receiver's DR is a PIM device directly connected to receivers and is responsible for sending Join messages to an RP and forwarding multicast data to the receivers.

An RP is the forwarding core in a PIM-SM domain, used to process join requests of the receiver's DR and registration requests of the multicast source's DR. An RP constructs an MDT with itself at the root and creates (S, G) entries to transmit multicast data to hosts. All routers in the PIM-SM domain must know the RP's location. The following table lists the types of RPs.

**Table 1** RP classifications
RP Type	Implementation	Usage Scenario	Precautions
Static RP	A static RP is manually configured. If a static RP is used, the same RP address must be configured on all PIM devices in the same domain.	Static RPs are recommended on small-/medium-sized networks because such networks are stable and have low requirements on network devices. NOTE: If only one multicast source exists on the network, setting the device directly connected to the multicast source as a static RP is recommended. In this case, the RP is also the source's DR, avoiding the process that the source's DR registers with the RP.	To use a static RP, ensure that all routers, including the RP, have the same RP and multicast group address range information.
Dynamic RP	A dynamic RP is elected among candidate-RPs (C-RPs) in the same PIM domain. The BSR sends Bootstrap messages to collect all C-RP information as an RP-Set, and advertises the RP-Set information to all PIM devices in the domain. Then, all the PIM devices use the same RP-Set information and follow the same rules to elect an RP. If the elected RP fails, the other C-RPs start an election process again to elect a new RP.	Dynamic RPs can be used on large-scale networks to improve network reliability and maintainability. If multiple multicast sources are densely distributed on the network, configuring core devices close to the multicast sources as C-RPs is recommended. If multiple users are densely distributed on the network, configuring core devices close to the users as C-RPs is recommended.	To use a dynamic RP, you must configure a BSR that dynamically advertises group-to-RP mapping information.
Embedded-RP	Embedded-RP is a mode used by a router in the ASM model to obtain RP addresses and is used either in an IPv6 PIM-SM domain or between IPv6 PIM-SM domains. An RP address is embedded in an IPv6 group address. Therefore, when obtaining an IPv6 group address, a router also obtains the RP address to which the IPv6 group address corresponds.	MSDP does not support IPv6 networks. As a result, it cannot allow IPv6 PIM-SM domains to learn RP information from each other, which leads to a multicast traffic interruption. Embedded-RP resolves this problem.	-

BSR

A BSR on a PIM-SM network collects RP information, summarizes that information into an RP-Set (group-RP mapping database), and advertises the RP-Set to the entire PIM-SM network.

A network can have only one BSR but can have multiple C-BSRs. If a BSR fails, a new BSR is elected from the C-BSRs.
RPT

An RPT is an MDT with an RP at the root and group members at the leaves.
SPT

An SPT is an MDT with the multicast source at the root and group members at the leaves.

Implementation

The multicast data forwarding process in a PIM-SM domain is as follows:

Neighbor discovery

Each PIM device in a PIM-SM domain periodically sends Hello messages to all other PIM devices in the domain to discover PIM neighbors and maintain PIM neighbor relationships.

By default, a PIM device permits other PIM control messages or multicast packets from a neighbor, regardless of whether the PIM device has received Hello messages from the neighbor. However, if a PIM device has the neighbor check function, it permits other PIM control messages or multicast packets from a neighbor only after the PIM device has received Hello messages from the neighbor.
DR election

PIM devices exchange Hello messages to elect a DR on a shared network segment. The receiver's DR is the only multicast data forwarder on a shared network segment. The source's DR is responsible for forwarding multicast data received from the multicast source to the RP.
RP discovery

An RP is the forwarding core in a PIM-SM domain. A dynamic or static RP forwards multicast data over the entire network.
RPT setup

PIM-SM assumes that no hosts want to receive multicast data. Therefore, PIM-SM sets up an RPT only after a host requests multicast data, and then sends the data from the RP to the host along the RPT.
SPT switchover

A multicast group in a PIM-SM domain is associated with only one RP and one RPT. All multicast data packets are forwarded by the RP. The path along which the RP forwards multicast data may not be the shortest path from the multicast source to receivers. The load of the RP increases when the multicast traffic volume increases. If the multicast data forwarding rate exceeds a configured threshold, an RPT-to-SPT switchover can be implemented to reduce the burden on the RP.

If a network problem occurs, the Assert mechanism or a DR switchover delay can be used to guarantee that multicast data is transmitted properly.

Assert

If multiple multicast data forwarders exist on a network segment, each multicast packet is repeatedly sent across the network segment, generating redundant multicast data. To resolve this issue, the Assert mechanism can be used to select a unique multicast data forwarder on a network segment.
DR switchover delay

If the role of an interface on a PIM device is changed from DR to non-DR, the PIM device immediately stops using this interface to forward data. If the new DR has not received multicast data, multicast data traffic is temporarily interrupted. If a DR switchover delay is configured, the interface continues to forward multicast data until the delay expires. Setting a DR switchover delay prevents multicast data traffic from being interrupted.

The detailed PIM-SM implementation process is as follows:

Neighbor Discovery

Each PIM-enabled interface on a PIM device sends Hello messages. A multicast packet that carries a Hello message has the following features:

The destination address is 224.0.0.13, indicating that this packet is destined for all PIM devices on the same network segment as the interface that sends this packet.
The source address is an interface address.
The TTL value is 1, indicating that the packet is sent only to neighbor interfaces.

Hello messages are used to discover neighbors, adjust protocol parameters, and maintain neighbor relationships.

Discovering PIM neighbors

All PIM devices on the same network segment must receive multicast packets with the destination address 224.0.0.13. Directly connected multicast routers can then learn neighbor information from the received Hello messages.
Adjusting protocol parameters

A Hello message carries the following protocol parameters:
- DR_Priority: priority used by each router to elect a DR. The higher a router's priority is, the higher the probability that the router will be elected as the DR.
- Holdtime: timeout period during which the neighbor remains in the reachable state.
- LAN_Delay: delay for transmitting a Prune message on the shared network segment.
- Override-Interval: interval carried in a Hello message for overriding a Prune message.
Maintaining neighbor relationships

PIM devices periodically exchange Hello messages. If a PIM device does not receive a new Hello message from its PIM neighbor within the Holdtime, the router considers the neighbor unreachable and deletes the neighbor from its neighbor list.

PIM neighbor relationship changes cause the multicast topology to change. If an upstream or a downstream neighbor is unreachable, multicast routes re-converge, and the MDT is updated.

DR Election

The network segment on which a multicast source or group members reside is usually connected to multiple PIM devices, as shown in Figure 2. The PIM devices exchange Hello message to set up PIM neighbor relationships. A Hello message carries the DR priority and the address of the interface that connects the PIM device to this network segment. The router compares the local information with the information carried in the Hello messages sent by other PIM devices to elect a DR. This process is a DR election. The election rules are as follows:

The PIM router with the highest DR priority wins.
If PIM devices have the same DR priority or PIM devices that do not support Hello messages carrying DR priorities exist on the network segment, the PIM device with the highest IP address wins.

Figure 2 DR election

RP Discovery

Static RP

A static RP is specified using a command. A static RP's address needs to be manually configured on other routers so they can find and use this RP for data forwarding.
Dynamic RP

A dynamic RP is elected from a set of PIM devices.

Figure 3 Dynamic RP election

In Figure 3, the dynamic RP election rules are as follows:
1. To use a dynamic RP, configure C-BSRs to elect a BSR among the set of C-BSRs.
  
  Each C-BSR considers itself a BSR and advertises a Bootstrap message. The Bootstrap message carries the address and priority of the C-BSR. Each router compares the information contained in all received Bootstrap messages to determine which C-BSR becomes the BSR. The election rules are as follows:
  1. If the C-BSRs have different priorities, the C-BSR with the highest priority (largest priority value) is elected as the BSR.
  2. If the C-BSRs have the same priority, the C-BSR with the highest IP address is elected as the BSR.
  All routers use the same election rule and therefore they will elect the same BSR and learn the BSR address.
2. The C-RPs send C-RP Advertisement messages to the BSR. Each of the message carries the address of the C-RP that sent it, the range of multicast groups that the C-RP serves, and the priority of the C-RP.
3. The BSR collects the received information as an RP-Set, encapsulates the RP-Set information in a Bootstrap message, and advertises the Bootstrap message to all PIM-SM devices.
4. Each router uses the RP-Set information to perform calculation and comparison using the same rule to elect an RP from multiple C-RPs. The election rules are as follows:
  1. The C-RP with the longest mask length of the served group address range matching the specific multicast group wins.
  2. If group addresses that all C-RPs serve have the same mask length, the C-RP with the highest priority wins (a larger priority value indicates a lower priority).
  3. In case of the same priority, hash functions are operated. The C-RP with the greatest calculated value wins.
  4. If all the preceding factors are the same, the C-RP with the highest IPv6 address wins.
5. Because all routers use the same RP-Set and the same election rules, the mapping between the multicast group and the RP is the same for all the routers. The routers save the mapping to guide subsequent multicast operations.
If a router needs to interwork with an auto-RP-capable device, auto-RP listening must be enabled. After auto-RP listening is enabled, the router can receive auto-RP announcement and discovery messages, parse the messages to obtain source addresses, and perform RPF checks based on the source addresses.
- If an RPF check fails, the router discards the auto-RP message.
- If an RPF check succeeds, the router forwards the auto-RP message to PIM neighbors. The auto-RP message carries the multicast group address range served by the RP to guide subsequent multicast operations.
Auto-RP listening is supported only in IPv4 scenarios.
Embedded RP

Embedded-RP is a mode used by the router in the ASM model to obtain an RP address and applies only to IPv6 PIM-SM. To ensure consistent RP election results, an RP obtained in embedded-RP mode takes precedence over RPs elected using other mechanisms. The address of an RP obtained in embedded-RP mode must be embedded in an IPv6 multicast group address, which must meet both of the following conditions:
- In the range of IPv6 multicast addresses.
- The IPv6 multicast group address must not be within the SSM group address range.
After a router calculates the RP address from the IPv6 multicast group address, the router uses the RP address to discover a route for forwarding multicast packets. The process for calculating the RP address is as follows:
1. The router copies the first N bits of the network prefix in the IPv6 multicast group address. Here, N is specified by the plen field.
2. The router replaces the last four bits with the contents of the RIID field. An RP address is then obtained. RIID indicates the interface ID of the RP. There is no default value.
Figure 4 shows the mapping between the IPv6 multicast group address and RP address.

Figure 4 Mapping between the IPv6 multicast group address and RP address
Anycast RP

In a traditional PIM-SM domain, each multicast group is mapped to only one RP. When the network is overloaded or traffic is heavy, many network problems can occur. For example, if the RP is overloaded, routes will converge slowly, or the multicast forwarding path will not be optimal.
Anycast-RP can be used to address these problems. Currently, Anycast-RP can be implemented through MSDP or PIM:
- Through MSDP: Multiple RPs with the same address are configured in a PIM-SM domain and MSDP peer relationships are set up between the RPs to share multicast data sources.
  
  This mode is only for use on IPv4 networks. For details about the implementation principles, see Anycast-RP in MSDP.
- Through PIM: Multiple RPs with the same address are configured in a PIM-SM domain and the device where an RP resides is configured with a unique local address to identify the RP. These local addresses are used to set up connectionless peer relationships between the devices. The peers share multicast source information by exchanging Register messages.
  
  This mode is for use on both IPv4 and IPv6 networks.

These two modes cannot be both configured on the same device in a PIM-SM domain. If Anycast-RP is implemented through PIM, you can also configure the device to advertise the source information obtained from MSDP peers in another domain to peers in the local domain.

Receivers and the multicast source each select the RPs closest to their own location to create RPTs. After receiving multicast data, the receiver's DR determines whether to trigger an SPT switchover. This ensures the optimal RPT and load sharing. The following section covers the principles of Anycast-RP in PIM.

Figure 5 Typical networking for Anycast-RP in PIM

As shown in Figure 5, in a PIM-SM domain, multicast sources S1 and S2 send multicast data to multicast group G, and U1 and U2 are members of group G. Perform the following operations to use PIM to implement Anycast-RP in the PIM-SM domain:

Configure RP1 and RP2 and assign both the same IP address (address of a loopback interface). Assume that the IP address is 10.10.10.10.
Set up a connectionless peer relationship between RP1 and RP2 using unique IP addresses. Assume that the IP address of RP1 is 1.1.1.1 and the IP address of RP2 is 2.2.2.2.

The implementation of Anycast-RP in PIM is as follows:

The receiver sends a Join message to the closest RP and builds an RPT.
- U1 joins the RPT with RP1 as the root, and RP1 creates an (*, G) entry.
- U2 joins the RPT with RP2 as the root, and RP2 creates an (*, G) entry.
The multicast source sends a Register message to the closest RP.
- DR1 sends a Register message to RP1, and RP1 creates an (S1, G) entry. Multicast data from S1 reaches U1 along the RPT.
- DR2 sends a Register message to RP2, and RP2 creates an (S2, G) entry. Multicast data from S2 reaches U2 along the RPT.
After receiving Register messages from the source's DRs, RPs re-encapsulate the Register messages and forward them to peers to share multicast source information.
- After receiving the (S1, G) Register message from DR1, RP1 replaces the source and destination addresses with 1.1.1.1 and 2.2.2.2, respectively, and re-encapsulates the message and sends it to RP2. Upon receiving the specially encapsulated Register message from peer 1.1.1.1, RP2 processes this Register message without forwarding it to other peers.
- After receiving the (S2, G) Register message from DR2, RP2 replaces the source and destination addresses with 2.2.2.2 and 1.1.1.1, respectively, and re-encapsulates the message and sends it to RP1. Upon receiving the specially encapsulated Register message from peer 2.2.2.2, RP1 processes this Register message without forwarding it to other peers.
The RP joins an SPT with the source's DR as the root to obtain multicast data.
- RP1 sends a Join message to S2. Multicast data from S2 first reaches RP1 along the SPT and then reaches U1 along the RPT.
- RP2 sends a Join message to S1. Multicast data from S1 reaches RP2 first through the SPT and then reaches U2 through the RPT.
After receiving multicast data, the receiver's DR determines whether to trigger an SPT switchover.

RPT Setup

Figure 6 RPT setup and data forwarding processes

Setting up an RPT creates a forwarding path for multicast data. Figure 6 shows the networking.

When a multicast source sends the first multicast packet of a multicast group to its DR, the source's DR encapsulates the multicast packet in a Register message and unicasts the Register message to the RP. The RP creates an (S, G) entry to register the multicast source information.
When a receiver joins a multicast group through IGMP, the receiver's DR sends a Join message to the RP. An (*, G) entry is then created on each hop, and an RPT is created.
When a receiver joins a multicast group and a multicast source sends a multicast packet for the group, the multicast source's DR encapsulates the multicast packet in a Register message and unicasts the Register message to the RP. The RP then forwards the multicast data along the RPT to group members.

The RPT implements on-demand multicast data forwarding, which reduces bandwidth consumption.

To reduce the RPT forwarding loads and improve multicast data forwarding efficiency, PIM-SM supports SPT switchovers, allowing a multicast network to set up an SPT with the multicast source as the root. Then, the multicast source can send multicast data directly to receivers along the SPT.

SPT Switchover

In a PIM-SM domain, a multicast group interacts with only one RP, and only one RPT is set up. If SPT switchover is not enabled, all multicast packets must be encapsulated in Register messages and then sent to the RP. After receiving the packets, the RP de-encapsulates them and forwards them along the RPT.

Since all multicast packets forwarded along the RPT are transferred by the RP, the RP may be overloaded when multicast traffic is heavy. To resolve this problem, PIM-SM allows the RP or the receiver's DR to trigger an SPT switchover.

Figure 7 SPT switchover triggered by the receiver's DR

An SPT switchover can be triggered by the RP or by the receiver's DR:

SPT switchover triggered by the RP

Register messages sent from the source's DR are decapsulated by the RP, which then forwards multicast data along the RPT to group members. In addition, the RP sends SPT Join messages to the source's DR to set up an SPT from the RP to the source.

After the SPT is set up and starts carrying multicast data packets, the RP stops processing Register messages. This frees the source's DR and RP from encapsulating and decapsulating packets. Multicast data is sent from the router directly connected to the multicast source to the RP along the SPT and then forwarded to group members along the RPT.
SPT switchover triggered by the receiver's DR
1. As shown in Figure 7, multicast data is forwarded along the RPT. The receiver's DR (DeviceD) sends (*, G) Join messages to the RP. Multicast data is sent to the receiver's DR (DeviceD) along the path multicast source's DR (DeviceA) -> RP (DeviceB) -> receiver's DR (DeviceD).
2. The receiver's DR periodically checks the forwarding rate of multicast packets. If the receiver's DR finds that the forwarding rate is greater than the configured threshold, the DR triggers an SPT switchover.
3. The receiver's DR sends (S, G) Join messages to the source's DR. After receiving multicast data along the SPT, the receiver's DR discards multicast data received along the RPT and sends a Prune message to the RP to delete the receiver from the RPT. The switchover from the RPT to the SPT is complete.
4. Multicast data is forwarded along the SPT. Specifically, multicast data is transmitted to receivers along the path multicast source's DR (DeviceA) -> receiver's DR (DeviceD).
An SPT is set up from the source to group members, and therefore subsequent packets may bypass the RP. The RPT may not be an SPT. After an SPT switchover is performed, delays in transmitting multicast data on the network are reduced.

If one source sends packets to multiple groups simultaneously and an SPT switchover policy is specified for a specified group range:

Before an SPT switchover, these packets reach the receiver's DR along the RPT.
After an SPT switchover, only the packets sent to the groups within the range specified in the SPT switchover policy are forwarded along the SPT. Packets sent to other groups are still forwarded along the RPT.

Assert

Either of the following conditions indicates other multicast forwarders are present on the network segment:

A multicast packet fails the RPF check.
The interface that receives the multicast packet is a downstream interface in the (S, G) entry on the local router.

If other multicast forwarders are present on the network segment, the router starts the Assert mechanism.

The router sends an Assert message through the downstream interface. The downstream interface also receives an Assert message from a different multicast forwarder on the network segment. The destination address of the multicast packet in which the Assert message is encapsulated is 224.0.0.13. The source address of the packet is the downstream interface address. The TTL value of the packet is 1. The Assert message carries the route cost from the PIM device to the source or RP, priority of the used unicast routing protocol, and the group address.

The router compares its information with the information carried in the message sent by its neighbor. This process is called Assert election. The election rules are as follows:

The router that runs a higher priority unicast routing protocol wins.
If the routers have the same unicast routing protocol priority, the router with the smaller route cost to the source wins.
If the routers have the same priority and route cost, the router with the highest IP address for the downstream interface wins.

The router performs the following operations based on the Assert election result:

If the router wins the election, the downstream interface of the router is responsible for forwarding multicast packets on the network segment. The downstream interface is called an Assert winner.
If the router does not win the election, the downstream interface is prohibited from forwarding multicast packets and is deleted from the downstream interface list of the (S, G) entry. The downstream interface is called an Assert loser.

After Assert election is complete, only one upstream router that has a downstream interface exists on the network segment, and the downstream interface transmits only one copy of each multicast packet. The Assert winner then periodically sends Assert message to maintain its status as the Assert winner. If the Assert loser does not receive any Assert message from the Assert winner throughout the timer of the Assert loser, the loser re-adds downstream interfaces for multicast data forwarding.

DR Switchover Delay

If an existing DR fails, the PIM neighbor relationship times out, and a new DR election is triggered.

By default, when an interface changes from a DR to a non-DR, the router immediately stops using the interface to forward data. If the new DR has not received multicast data, multicast data traffic is temporarily interrupted.

When a PIM-SM interface that has a PIM DR switchover delay configured receives Hello messages from a new neighbor and changes from a DR to a non-DR, the interface continues to function as a DR and to forward multicast packets until the delay times out.

If the router that has a DR switchover delay configured receives packets from a new DR before the delay expires, the router immediately stops forwarding packets. When a new IGMP Report message is received on the shared network segment, the new DR (instead of the original DR configured with a DR switchover delay) sends a PIM Join message to the upstream device.

If the new DR receives multicast data from the original DR before the DR switchover delay expires, an Assert election is triggered.

PIM-SM Administrative Domain

A PIM-SM network is divided into a global domain and multiple BSR administrative domains to simplify network management. Dividing the network into domains can reduce the workloads of a single BSR and can use private group addresses to provide special services for users in a specific domain.

Each BSR administrative domain has only one BSR that serves a multicast group for a specific address range. The global domain has a BSR that serves the other multicast groups.

The relationship between the BSR administrative domain and the global domain is described as follows in terms of the domain space, group address range, and multicast function.

Domain space

Figure 8 BSR administrative domain - domain space

As shown in Figure 8, different BSR administrative domains contain different routers. A router cannot belong to multiple BSR administrative domains. Each BSR administrative domain is independent and geographically isolated from other domains. A BSR administrative domain manages a multicast group for a specific address range. Multicast packets within this address range can be transmitted only in this BSR administrative domain and cannot exit the border of the domain.

The global domain contains all the routers on the PIM-SM network. Multicast packets that do not belong to a particular BSR administrative domain can be transmitted over the entire PIM network.
Group address range

Figure 9 BSR administrative domain - address range

Each BSR administrative domain provides services to the multicast group within a specific address range. The multicast groups that different BSR administrative domains serve can overlap. However, a multicast group address that a BSR administrative domain serves is valid only in its BSR administrative domain because a multicast address is a private group address. As shown in Figure 9, the group address range of BSR1 overlaps with that of BSR3.

The multicast group that does not belong to any BSR administrative domain belongs to the global domain. That is, the group address range of the global domain is G-G1-G2.
Multicast function

As shown in Figure 8, the global domain and each BSR administrative domain have their respective C-RP and BSR devices. Devices only function in the domain to which they are assigned. Each BSR administrative domain has a BSR mechanism and RP elections that are independent of other domains.

Each BSR administrative domain has a border. Multicast information for this domain, such as the C-RP Advertisement messages and BSR Bootstrap message, can be transmitted only within the domain. Multicast information for the global domain can be transmitted throughout the entire global domain and can traverse any BSR administrative domain.