[R1/DCS-7512N]:et11/36/1--{180.13.1.0/24}--et11/1:[R2/DCS-7050QX]
* both routers run 4.20.11M
* Between R1 and R2, ISIS/BFD is enabled shown as below
Router Configuration:
R1 configuration:
router isis isis.def
net 49.0001.0000.0000.0001.00
is-type level-1
log-adjacency-changes
redistribute connected
spf-interval 1 50 100
timers lsp generation 1 10 1000
!
address-family ipv4 unicast
maximum-paths 32
!
address-family ipv6 unicast
maximum-paths 32
!
interface Ethernet11/36/1
description UpEt_A_3.1
speed forced 10000full
no switchport
ip address 180.13.1.1/24
ipv6 address 180:13:1::1/64
isis enable isis.def
isis bfd
isis circuit-type level-1
isis network point-to-point
isis authentication mode md5 level-1
isis authentication key 7 QI3r+/7nP3Q= level-1
R2 configuration:
router isis isis.def
net 49.0001.0000.0000.0103.00
is-type level-1
log-adjacency-changes
spf-interval 1 50 100
timers lsp generation 1 10 1000
!
address-family ipv4 unicast
maximum-paths 32
!
address-family ipv6 unicast
maximum-paths 32
!
interface Ethernet11/1
description UpEt_A_3.1
speed forced 10000full
no switchport
ip address 180.13.1.100/24
ipv6 address 180:13:1::100/64
isis enable isis.def
isis bfd
isis circuit-type level-1
isis network point-to-point
isis authentication mode md5 level-1
isis authentication key 7 QI3r+/7nP3Q= level-1
Symptom:
R1: No ISIS and BFD neighbor
R1.13:53:14(config)#show isis nei | grep uspine3
R1.13:53:16(config)#show bfd nei | grep 180.13
R1.13:53:18(config)#
R2: ISIS neighbor stuck in INIT, and BFD is down
R2#sh isis nei
Instance VRF System Id Type Interface SNPA State Hold time Circuit Id
isis.def default R1 L1 Ethernet11/1 P2P INIT 26 2D
R2#sh bfd nei
VRF name: default
-----------------
DstAddr MyDisc YourDisc Interface Type LastUp LastDown LastDiag State
-------------------- --------------- --------------- --------------------- ------------ -------------------- -------------------- ----------------------- -----
180.13.1.1 2628313401 0 Ethernet11/1(47) normal 04/15/19 13:25 04/15/19 13:26 Nbr Signaled Down Down
Step1: R1/R2 - "show lldp nei" good and ping works, so no connectivity issue but high latency
-- show lldp nei --
R2#sh lldp nei
Port Neighbor Device ID Neighbor Port ID TTL
Et11/1 R2 Ethernet11/36/1 120
-- ping -- R2#ping 180.13.1.1
PING 180.13.1.1 (180.13.1.1) 72(100) bytes of data.
80 bytes from 180.13.1.1: icmp_seq=1 ttl=64 time=390 ms
80 bytes from 180.13.1.1: icmp_seq=2 ttl=64 time=437 ms
....
-- show interface <intf> | grep error --
R1.13:46:01(config)#sh int et11/36/1 | grep error
0 input errors, 0 CRC, 0 alignment, 0 symbol, 0 input discards
0 output errors, 0 collisions
R2.13:45:34(config)#sh int et11/1 | grep error
0 input errors, 0 CRC, 0 alignment, 0 symbol, 0 input discards
0 output errors, 0 collisions
Step2: R2's Down BFD shows NO incoming bfd pkts
-- show bfd neighbor dest-ip <ip> detail --
R2#sh bfd neighbors dest-ip 180.13.1.1 detail
VRF name: default
-----------------
Peer Addr 180.13.1.1, Intf Ethernet11/1, Type normal, State Down
VRF default, LAddr 180.13.2.100, LD/RD 2628313401/0
Session state is Down and not using echo function
Last Up Apr 15 13:25:27 2019
Last Down Apr 15 13:26:23 2019
Last Diag: Nbr Signaled Down
TxInt: 1000, RxInt: 1000, Multiplier: 3
Received RxInt: 300, Received Multiplier: 3
Rx Count: 0, Rx Interval (ms) min/max/avg: 0/0/0 last: never
Tx Count: 87681, Tx Interval (ms) min/max/avg: 508/996/870 last: 627 ms ago
Detect Time: 3000
Sched Delay: 1*TxInt: 87898, 2*TxInt: 0, 3*TxInt: 0, GT 3*TxInt: 0
R2 sends out BFD but no receiving
-- bash tcpdump -ni <intf> udp --
[admin@R2 ~]$ tcpdump -ni et11_1 udp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on et11_1, link-type EN10MB (Ethernet), capture size 262144 bytes
11:48:22.309536 28:99:3a:3b:15:39 > 44:4c:a8:a5:11:40, ethertype IPv4 (0x0800), length 66: 180.13.1.100.57657 > 180.13.1.1.bfd-control: BFDv1, Control, State Down, Flags: [none], length: 24
11:48:23.301574 28:99:3a:3b:15:39 > 44:4c:a8:a5:11:40, ethertype IPv4 (0x0800), length 66: 180.13.1.100.57657 > 180.13.1.1.bfd-control: BFDv1, Control, State Down, Flags: [none], length: 24
Same observation on R1, only receiving no sending
R1.13:53:18(config)#bash tcpdump -ni et11_36_2 udp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on et11_36_2, link-type EN10MB (Ethernet), capture size 262144 bytes
13:55:10.568533 28:99:3a:3b:15:39 > 44:4c:a8:a5:11:40, ethertype IPv4 (0x0800), length 66: 180.13.2.100.57657 > 180.13.2.1.bfd-control: BFDv1, Control, State Down, Flags: [none], length: 24
13:55:11.445074 28:99:3a:3b:15:39 > 44:4c:a8:a5:11:40, ethertype IPv4 (0x0800), length 66: 180.13.2.100.57657 > 180.13.2.1.bfd-control: BFDv1, Control, State Down, Flags: [none], length: 24
ISIS should notify the BFD to establish session after ISIS neighbor up. So the symptom is narrowed down to ISIS neighbor issue.
Step 3: Tcpdump ISIS IIH (isis hello) on both sides
-- show ver | grep MAC to get system MAC --
R1#show ver | grep MAC
System MAC address: 444c.a8a5.1140
R2#show ver | grep MAC
System MAC address: 2899.3a3b.1539
R2: IIH 2-way and looks good, R1/R2 send IIH every 10 seconds
-- bash tcpdump -ni <intf> iih --
[admin@R2 ~]$ tcpdump -ni et11_1 iih
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on et11_1, link-type EN10MB (Ethernet), capture size 262144 bytes
11:18:16.104275 44:4c:a8:a5:11:40 > 09:00:2b:00:00:05, 802.3, length 1500: LLC, dsap OSI (0xfe) Individual, ssap OSI (0xfe) Command, ctrl 0x03: OSI NLPID IS-IS (0x83): p2p IIH, src-id 0000.0000.0001, length 1497
11:18:18.000062 28:99:3a:3b:15:39 > 09:00:2b:00:00:05, 802.3, length 1500: LLC, dsap OSI (0xfe) Individual, ssap OSI (0xfe) Command, ctrl 0x03: OSI NLPID IS-IS (0x83): p2p IIH, src-id 0000.0000.0103, length 1497
11:18:24.723990 44:4c:a8:a5:11:40 > 09:00:2b:00:00:05, 802.3, length 1500: LLC, dsap OSI (0xfe) Individual, ssap OSI (0xfe) Command, ctrl 0x03: OSI NLPID IS-IS (0x83): p2p IIH, src-id 0000.0000.0001, length 1497
11:18:27.597320 28:99:3a:3b:15:39 > 09:00:2b:00:00:05, 802.3, length 1500: LLC, dsap OSI (0xfe) Individual, ssap OSI (0xfe) Command, ctrl 0x03: OSI NLPID IS-IS (0x83): p2p IIH, src-id 0000.0000.0103, length 1497
R1: IIH, only 1 way, from R1 to R2, no R2 to R1
[admin@bn303 ~]$ tcpdump -ni et11_36_2 iih
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on et11_36_2, link-type EN10MB (Ethernet), capture size 262144 bytes
11:23:53.268387 44:4c:a8:a5:11:40 > 09:00:2b:00:00:05, 802.3, length 1500: LLC, dsap OSI (0xfe) Individual, ssap OSI (0xfe) Command, ctrl 0x03: OSI NLPID IS-IS (0x83): p2p IIH, src-id 0000.0000.0001, length 1497
11:24:02.823894 44:4c:a8:a5:11:40 > 09:00:2b:00:00:05, 802.3, length 1500: LLC, dsap OSI (0xfe) Individual, ssap OSI (0xfe) Command, ctrl 0x03: OSI NLPID IS-IS (0x83): p2p IIH, src-id 0000.0000.0001, length 1497
IIH packets are destinated to well-known LAN broadcast address (0900:2b00:0005). These packets can't reach from R2 to R1.
Step 4: Arping
Arping shows similar, R1 to R2 is good, but seeing broadcast packets (all ff) drop on the direction of R2 to R1 even with arp ping
* Arp ping is to send our brocast L2 arp request to probe hosts
[admin@R1 ~]$ arping -b -I et11_36_1 180.13.1.100
ARPING 180.13.1.100 from 180.13.1.1 et11_36_1
Unicast reply from 180.13.1.100 [28:99:3A:3B:15:39] 419.967ms
Unicast reply from 180.13.1.100 [28:99:3A:3B:15:39] 409.205ms
Unicast reply from 180.13.1.100 [28:99:3A:3B:15:39] 401.616ms
Unicast reply from 180.13.1.100 [28:99:3A:3B:15:39] 410.797ms
^CSent 4 probes (4 broadcast(s))
Received 4 response(s)
[admin@R2 ~]$ arping -b -I et11_1 180.13.1.1
ARPING 180.13.1.1 from 180.13.1.100 et11_1
^CSent 9 probes (9 broadcast(s))
Received 0 response(s)
[admin@R2 ~]$ tcpdump -ni et11_1 arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on et11_1, link-type EN10MB (Ethernet), capture size 262144 bytes
11:46:04.859574 28:99:3a:3b:15:39 > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 180.13.1.1 (Broadcast) tell 180.13.1.100, length 28
11:46:05.859645 28:99:3a:3b:15:39 > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 180.13.1.1 (Broadcast) tell 180.13.1.100, length 28
At this point, the issue is narrowed down to, broadcast packets from R2 to R1 are dropped unexpectedly, which causes R1 not seeing R2's IIH packets, then no ISIS/BFD neighbor.
Step 5: hardware issue, but who drops the packets?
Ok, now we can say, most likely the problem is hardware, but who? The best way to nail down the culprit is to mirror packets to cpu, to see if the broadcast packets leave the R2.
* The feature of mirroring interface packets to CPU is only supported on Sand (Jericho/Arad) platform.
R1.14:32:14(config)#monitor session 1 source ethernet 11/36/1 rx
R1.14:32:19(config)#monitor session 1 destination cpu
Seeing IIH from R2 to R1 on the mirror cpu. << This is the packets received on the wire of et11/36/1
R1.14:32:20(config)#bash tcpdump -ni mirror0 iih
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on mirror0, link-type EN10MB (Ethernet), capture size 262144 bytes
14:33:04.105249 28:99:3a:3b:15:39 > 09:00:2b:00:00:05, 802.3, length 1500: LLC, dsap OSI (0xfe) Individual, ssap OSI (0xfe) Command, ctrl 0x03: OSI NLPID IS-IS (0x83): p2p IIH, src-id 0000.0000.0103, length 1497
14:33:13.257565 28:99:3a:3b:15:39 > 09:00:2b:00:00:05, 802.3, length 1500: LLC, dsap OSI (0xfe) Individual, ssap OSI (0xfe) Command, ctrl 0x03: OSI NLPID IS-IS (0x83): p2p IIH, src-id 0000.0000.0103, length 1497
14:33:21.554871 28:99:3a:3b:15:39 > 09:00:2b:00:00:05, 802.3, length 1500: LLC, dsap OSI (0xfe) Individual, ssap OSI (0xfe) Command, ctrl 0x03: OSI NLPID IS-IS (0x83): p2p IIH, src-id 0000.0000.0103, length 1497
But not on the physical ethernet port << This is the packets received by the kernel via et11/36/1
R1.14:33:22(config)#bash tcpdump -ni et11_36_1 iih
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on et11_36_1, link-type EN10MB (Ethernet), capture size 262144 bytes
14:33:35.056748 44:4c:a8:a5:11:40 > 09:00:2b:00:00:05, 802.3, length 1500: LLC, dsap OSI (0xfe) Individual, ssap OSI (0xfe) Command, ctrl 0x03: OSI NLPID IS-IS (0x83): p2p IIH, src-id 0000.0000.0001, length 1497
14:33:42.990686 44:4c:a8:a5:11:40 > 09:00:2b:00:00:05, 802.3, length 1500: LLC, dsap OSI (0xfe) Individual, ssap OSI (0xfe) Command, ctrl 0x03: OSI NLPID IS-IS (0x83): p2p IIH, src-id 0000.0000.0001, length 1497
Now we know the broadcast packets are dropped on R1 between eth and kernel.
Step 6: Counters
And this is approved by the counter output on both R1 and R2
R2/Trident - No loss
R2#clear counters
R2#clear platform trident counters
R2#sh cpu counters queue | nz
--------------------------------------------------------------------------------
Linecard0/0
--------------------------------------------------------------------------------
Queue Counter/pkts* Drops/pkts
--------------- ------------------- -------------------
OSPF/ISIS 2 0
BFD 10 0
R2#sh platform trident counters interface e11/1 | egrep -i 'card|drop' | nz
R2#
R1/Jericho - Loss seen
R1#clear hardware counter drop
R1#clear counters
R1#sh cpu counters queue | grep 'Et11/36/1' | nz
CoppSystemLldp Et11/36/1 4 924 0 0
R1#sh hardware counter drop | grep o11/1
A Jericho11/1 DeqDeletePktCnt : 39148 : 2019-04-16 11:59:39 : 2019-04-16 11:59:39
P Jericho11/1 dropVoqInNullRoute : 3800339040 : 2019-04-16 11:59:39 : 2019-04-16 11:59:39
R2#sh hardware counter drop | grep o11/1
A Jericho11/1 DeqDeletePktCnt : 39153 : 2019-04-16 11:59:39 : 2019-04-16 11:59:45
P Jericho11/1 dropVoqInNullRoute : 3800605701 : 2019-04-16 11:59:39 : 2019-04-16 11:59:45
So, should goes to DeqDeletePktCnt or dropVoqInNullRoute. What does they mean?
No comments:
Post a Comment