Friday 24 May 2013

Investigating high CPU usage on cisco switches

show processes cpu sorted | excl 0.00%  0.00%  0.00%

This command will show you the process that is using the most CPU. If its over 5% then there is a problem. Google the process name to see what it does and take it from there. View the graphs in your monitoring system to narrow down when it started. Check the logs from the switch.

9200 16.x TS steps
https://www.cisco.com/c/en/us/support/docs/ios-nx-os-software/ios-xe-16/213549-troubleshoot-high-cpu-usage-in-catalyst.html

  • IOSd
  • LSMPI
  • FED 
  • Doppler ASIC
  • Physical interface

IOSd: This is the Cisco IOS® daemon that runs on the Linux kernel. It is run as a software process within the kernel

LSMPI: Linux Shared Memory Punt Interface

Forwarding Engine Driver (FED): This is the heart of the Cisco Catalyst switch and is responsible for all hardware programming/forwarding


  • Packet Delivery System (PDS): This is the architecture and process of how packets are delivered to and from the various subsystems. As an example, it controls how packets are delivered from the FED to the IOSd and vice versa
  • Control Plane (CP): The control plane is a generic term used to group together the functions and traffic that involve the CPU of the Catalyst Switch. This includes traffic such as Spanning Tree Protocol (STP), Hot Standby Router Protocol (HSRP), and routing protocols that are destined to the switch, or sent from the switch. This also includes application layer protocols like Secure Shell (SSH), and Simple Network Management Protocol (SNMP) that must be handled by the CPU
  • Data Plane (DP): Typically the data plane encompasses the hardware ASICs and traffic that is forwarded without assistance from the Control Plane
  • Punt: Ingress protocol control packet which intercepted by DP sent to the CP to process it
  • Inject: CP generated protocol packet sent to DP to egress out on IO interface(s)

show processes cpu sorted 5min | e 0.00%  0.00%  0.00% problem.
Look for highest execution time

show platform hardware fed switch active qos queue stats internal cpu policer

show platform software fed switch active punt cause summary
show platform software fed switch active punt cause clear
show platform software fed switch active punt cause summary

show platform software fed switch active punt cpuq rates | e 0        0        0        0        0        0

show platform software fed switch active punt rates interfaces

show platform software fed switch active punt rates interfaces 0x000001d2

show platform software fed switch active punt rates interfaces 0x000001d2 | e 0        0        0        0

show monitor capture cpuCap buffer brief

show monitor capture cpuCap buffer detailed

Packet captures on switch
https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst3850/software/release/16-3/configuration_guide/b_163_consolidated_3850_cg/b_163_consolidated_3850_cg_chapter_01001011.html

show processes cpu history
*'s show spikes, #'s are used for average


Script for  Intermittent High CPU
From https://www.cisco.com/c/en/us/support/docs/ios-nx-os-software/ios-xe-16/213549-troubleshoot-high-cpu-usage-in-catalyst.html#anc28

In the event that the high CPU on the switch is intermittent, it is possible to set up a script on the switch to automatically run these commands at the time of high CPU events. The entry-val is used to determine how high the CPU is before the script triggers. The script monitors the 5 second CPU average SNMP OID. Two files are written to the flash, tac-cpu-<timestamp>.txt contains the command outputs, and tac-cpu-<timestamp>.pcap contains the CPU ingress capture. These files can then be reviewed at a later date.

config t
no event manager applet high-cpu authorization bypass
event manager applet high-cpu authorization bypass
event snmp oid 1.3.6.1.4.1.9.9.109.1.1.1.1.3.1 get-type next entry-op gt entry-val 80 poll-interval 1 ratelimit 300 maxrun 180
action 0.01 syslog msg "High CPU detected, gathering system information."
action 0.02 cli command "enable"
action 0.03 cli command "term exec prompt timestamp"
action 0.04 cli command "term length 0"
action 0.05 cli command "show clock"
action 0.06 regex "([0-9]|[0-9][0-9]):([0-9]|[0-9][0-9]):([0-9]|[0-9][0-9])" $_cli_result match match1
action 0.07 string replace "$match" 2 2 "."
action 0.08 string replace "$_string_result" 5 5 "."
action 0.09 set time $_string_result
action 1.01 cli command "show proc cpu sort | append flash:tac-cpu-$time.txt"
action 1.02 cli command "show proc cpu hist | append flash:tac-cpu-$time.txt"
action 1.03 cli command "show proc cpu platform sorted | append flash:tac-cpu-$time.txt"
action 1.04 cli command "show interface | append flash:tac-cpu-$time.txt"
action 1.05 cli command "show interface stats | append flash:tac-cpu-$time.txt"
action 1.06 cli command "show log | append flash:tac-cpu-$time.txt"
action 1.07 cli command "show ip traffic | append flash:tac-cpu-$time.txt"
action 1.08 cli command "show users | append flash:tac-cpu-$time.txt"
action 1.09 cli command "show platform software fed switch active punt cause summary | append flash:tac-cpu-$time.txt"
action 1.10 cli command "show platform software fed switch active cpu-interface | append flash:tac-cpu-$time.txt"
action 1.11 cli command "show platform software fed switch active punt cpuq all | append flash:tac-cpu-$time.txt"
action 2.08 cli command "no monitor capture tac_cpu"
action 2.09 cli command "monitor capture tac_cpu control-plane in match any file location flash:tac-cpu-$time.pcap"
action 2.10 cli command "monitor capture tac_cpu start" pattern "yes"
action 2.11 cli command "yes"
action 2.12 wait 10
action 2.13 cli command "monitor capture tac_cpu stop"
action 3.01 cli command "term default length"
action 3.02 cli command "terminal no exec prompt timestamp"
action 3.03 cli command "no monitor capture tac_cpu" 

No comments:

Post a Comment