4/25/2020

ECMP in MPLS L3 VPN


Consider the above topology
  • CE2 advertises prefix - 12.0.0.0/24 to both PE2 and PE3;
  • On PE2/PE3, this prefix fails in the same VRF but with different RD, says 65100:2 and 65100:3
  • Both PE2 and PE3 advertise this vpnv4 prefix along the path, ASBR2/3 - ASBR1 - PE1. 

1. On the PE1, we could see, no ECMP unde vpnv4, because different RD


PE1#show bgp vpn-ipv4 12.0.0.0/24
BGP routing table information for VRF default
Router identifier 5.5.5.5, local AS number 65000
BGP routing table entry for IPv4 prefix 12.0.0.0/24, Route Distinguisher: 65100:2
 Paths: 1 available
  65100 65101
    11.11.11.11 from 1.1.1.1 (1.1.1.1)
      Origin IGP, metric -, localpref 100, weight 0, valid, internal, best
      Extended Community: Route-Target-AS:65000:1
      Remote MPLS label: 118012
BGP routing table entry for IPv4 prefix 12.0.0.0/24, Route Distinguisher: 65100:3
 Paths: 1 available
  65100 65101
    11.11.11.11 from 1.1.1.1 (1.1.1.1)
      Origin IGP, metric -, localpref 100, weight 0, valid, internal, best
      Extended Community: Route-Target-AS:65000:4364
      Remote MPLS label: 116707

2. In the VRF routing table, there is ECMP to this destination. Because at PE, the ECMP entry is by ASBR NH + remote ASBR Label. In this case, 11.11.11.11:11607 and 11.11.11.11:118012

PE1#show ip route vrf cust_1 12.0.0.0/24

VRF: cust_1
......
 B I      12.0.0.0/24 [200/0] via 11.11.11.11/32, LDP tunnel index 1, label 116707
                                 via 1.0.0.8, Ethernet2/1, label 100000
                              via 11.11.11.11/32, LDP tunnel index 1, label 118012
                                 via 1.0.0.8, Ethernet2/1, label 100000

3. Hw routing table shows the ECMP index

PE1#show ip hardware ale vrf | egrep 'cust_1|VRF'
VRF Name             VRF ID Table ID
cust_1               15        65535

PE1#show platform jericho ip route 12.0.0.0/24
Tunnel Type: M(mpls), G(gre), MoG(mpls-over-gre),
             vxlan-o(vxlan outer-rewrite info), vxlan-i(vxlan inner-rewrite info)
CW - Control word
FL - Flow label
* - Routes in LEM
D - ECMP is divergent across switching chips
 ---------------------------------------------------------------------------------------------------------
|                                 Routing Table                                           |              |
|---------------------------------------------------------------------------------------------------------
|VRF|   Destination    |     |                    |     |       |                   | ECMP|  FEC | Tunnel
| ID|      Subnet      | Cmd |     Destination    | VID |Outlif |   MAC / CPU Code  |Index| Index|T Value
 ---------------------------------------------------------------------------------------------------------
|15 |12.0.0.0/24       |ROUTE| FEC 32831          |0    |  -    |                   |300  |  D   |M 116707
|15 |12.0.0.0/24       |ROUTE| FEC 32831          |0    |  -    |                   |300  |  D   |M 118012

4. But this behavior could exhaust the ECMP resource. From the below output, the 1600 vpnv4 prefixes use up 875 of 4096 ECMP entries. 

PE1#show hardware capacity | grep -i ECMP
ECMP                                                   875      21%        3220             0          4095         875

PE1#show bgp vpn-ipv4 summary
BGP summary information for VRF default
Router identifier 5.5.5.5, local AS number 65000
Neighbor Status Codes: m - Under maintenance
  Neighbor         V  AS           MsgRcvd   MsgSent  InQ OutQ  Up/Down State   PfxRcd PfxAcc
  1.1.1.1          4  65000           1693      1673    0    0 23:25:15 Estab   1602   1602

5. So there is a workaround to disable fib ecmp and lower the ECMP usage. 

PE1#conf term
PE1(config)#no ip hardware fib hierarchical next-hop disabled << default config, but have to flip 
PE1(config)#router general
PE1(config-router-general)#rib fib fec ecmp emulated

PE1-lp402.17:32:38(config-router-general)#show hardware capacity | grep -i ECMP
ECMP                                                     0       0%        4095             0          4095         875
ECMP              Mpls                                   0       0%        4095             0          4095           0
ECMP              Routing                                0       0%        4095             0          4095         875
ECMP              VxlanOverlay                           0       0%        4095             0          4095           0
ECMP              VxlanTunnel                            0       0%        3891             0          3891           0


Now no ECMP anymore in the sw/hw routing table. 

PE1#show ip route vrf cust_1 12.0.0.0/24

VRF: cust_1
 B I      12.0.0.0/24 [200/0] via 11.11.11.11/32, LDP tunnel index 1, label 116707
                                 via 1.0.0.8, Ethernet2/1, label 100000
                              via 11.11.11.11/32, LDP tunnel index 1, label 118012, backup
                                 via 1.0.0.8, Ethernet2/1, label 100000

PE1#show platform jericho ip route 12.0.0.0/24
 ---------------------------------------------------------------------------------------------------------
|                                 Routing Table                                           |              |
|---------------------------------------------------------------------------------------------------------
|VRF|   Destination    |     |                    |     |       |                   | ECMP|  FEC | Tunnel
| ID|      Subnet      | Cmd |     Destination    | VID |Outlif |   MAC / CPU Code  |Index| Index|T Value
 ---------------------------------------------------------------------------------------------------------
|15 |12.0.0.0/24       |ROUTE| FEC 32830          |0    |  -    |                   |  -  |49624 |M 116707

6. You probably want to take one step further to ask why the ECMP. Both PE2/PE3 are Arista EOS device which allocates labels per VRF. In this setup, only 8 VRFs but why the PE sees 800+ labels. 

Now let's check ASBR1, which receives 800 from 2 ASBRs as expected

ASBR1#sh bgp vpn-ipv4 summary
BGP summary information for VRF default
Router identifier 1.1.1.1, local AS number 65000
Neighbor Status Codes: m - Under maintenance
  Neighbor         V  AS           MsgRcvd   MsgSent  InQ OutQ  Up/Down State   PfxRcd PfxAcc
....
  192.158.115.11   4  65100           1568      1819    0    0    1d00h Estab   800    800
  192.168.115.11   4  65100           1749      1827    0    0    1d00h Estab   802    802

But different number of unique labels

ASBR1#sh bgp neighbors 192.168.115.11 vpn-ipv4 received-routes detail | grep Remote | awk '{print $4}' | sort | uniq | wc -l
8

ASBR1#sh bgp neighbors 192.158.115.11 vpn-ipv4 received-routes detail | grep Remote | awk '{print $4}' | sort | uniq | wc -l
800

Neighbor 192.158.115.11 which is Cisco XR device, sends 800 vpnv4 prefixes with per-prefix labels! Even it receives per-VRF labels, it still re-assign per-prefix labels. 

RP/0/RSP1/CPU0:ASBR3#show bgp vpnv4 unicast rd 65100:101 12.0.0.0/24 detail | inc bel
Sat Apr 25 16:55:59.312 UTC
    Local Label: 16694 (with rewrite);
      Received Label 116390
RP/0/RSP1/CPU0:ASBR3#show bgp vpnv4 unicast rd 65100:101 12.0.1.0/24 detail | inc Label
Sat Apr 25 16:56:02.058 UTC
    Local Label: 16695 (with rewrite);
      Received Label 116390

No comments:

Post a Comment