mlagA.10:18:17(config)#show ver
Arista DCS-7508
Hardware version: 06.00
....
Software image version: 4.22.0.1F
CPU only has 64% idle cycles. Not low.
mlagA.10:18:13(config)#show proc top once | more
%Cpu(s): 29.6 us, 3.9 sy, 0.0 ni, 64.0 id, 0.1 wa, 0.5 hi, 2.0 si, 0.0 st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23 root 20 0 0 0 0 R 89.6 0.0 26:04.70 ksoftirqd/2
17 root 20 0 0 0 0 R 68.5 0.0 24:27.80 ksoftirqd/1
I suspect there may be some unexpected traffic hitting the CPU, so check the output of "show cpu couter queue | nz"
yo412.mlagA.10:22:38(config)#clear counters
!!! even a single command - clear counter, takes almost 10 sec to complete !!!
yo412.mlagA.10:22:47(config)#show cpu counters queue | nz | more
Arad3/0:
CoPP Class Queue Pkts Octets DropPkts DropOctets
Aggregate
-----------------------------------------------------------------------------------------------------------------
CoppSystemL3LpmOverflow Et3/6/1 1753 473344 74945 21049856
CoppSystemL3LpmOverflow Et3/6/2 1112 307200 73605 20702976
CoppSystemL3LpmOverflow Et3/6/3 610 166912 86302 23954432
CoppSystemL3LpmOverflow Et3/6/4 1178 320256 77089 21414656
Looks like there is a lot of packets hitting the cpu, even the CoPP filters out most of them. But this is a full load chassis, the aggregated traffic is still too heavy to a x86 CPU.
Try to tcpdump the incoming packets from et3/6/1 and punted to cpu. Surprisingly not many...
mlagA.10:35:23(config)#bash tcpdump -nvvi et3_6_1
tcpdump: listening on et3_6_1, link-type EN10MB (Ethernet), capture size 262144 bytes
10:35:36.066689 00:1c:73:46:0d:b0 > 01:80:c2:00:00:02, ethertype Slow Protocols (0x8809), length 124: LACPv1, length 110
10:35:39.530341 00:1c:73:3b:e0:22 > 01:80:c2:00:00:02, ethertype Slow Protocols (0x8809), length 124: LACPv1, length 110
^C
2 packets captured
Try to mirror this port to cpu then tcpdump it. (This feature is only supported on 7500E/R or 7280R devices)
mlagA.10:37:46(config)#monitor session 1 source et3/6/1 rx
mlagA.10:39:12(config)#monitor session 1 destination cpu
mlagA.10:39:15(config)#bash tcpdump -nvi mirror0
tcpdump: listening on mirror0, link-type EN10MB (Ethernet), capture size 262144 bytes
10:40:08.198512 1e:af:14:08:18:02 > 00:aa:aa:aa:bb:cc, ethertype 802.1Q (0x8100), length 252: vlan 1408, p 0, ethertype IPv4,
100.14.8.119.30485 > 220.200.16.1.24659: Flags [R.UW], seq 0:194, ack 0, win 61689, urg 0, length 194
10:40:08.199078 1e:af:14:09:18:01 > 00:aa:aa:aa:bb:cc, ethertype 802.1Q (0x8100), length 252: vlan 1409, p 0, ethertype IPv4,
100.14.9.118.30504 > 220.200.17.1.24648: Flags [PUEW], seq 0:194, win 62028, urg 0, length 194
Do we have the route? No....
mlagA.10:40:08(config)#sh ip route 220.200.17.1
VRF: default
....
Gateway of last resort is not set
Create a null route for this prefix, response is better and "show cpu couter queue | nz" is back to normal now, no L3LPMOverflow anymore.
mlagA.10:54:28(config)#ip route 220.200.0.0/16 null0
Arad3/0:
CoPP Class Queue Pkts Octets DropPkts DropOctets
Aggregate
-----------------------------------------------------------------------------------------------------------------
CoppSystemIgmp Et3/1/2 160 10240 0 0
CoppSystemIgmp Et3/1/4 160 10240 0 0
But cpu still high. And the busiest process is changed to SandFap instead of ksoftirqd. Hmmm.... why?
mlagA.10:57:19(config)#sh proc top once | more
%Cpu(s): 30.2 us, 4.1 sy, 0.0 ni, 62.1 id, 0.1 wa, 0.5 hi, 3.1 si, 0.0 st
...
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12874 root 20 0 1001m 357m 192m R 100.4 2.2 110:39.04 SandFap
13025 root 20 0 1001m 357m 192m S 69.6 2.2 110:54.53 SandFap
16765 root 20 0 1001m 359m 193m S 51.2 2.2 96:56.18 SandFap
No comments:
Post a Comment