Juniper SRX High CPU

Today we came across an issue where an SRX had very high CPU usage. After a bit of digging it turned out to be the httpd process which runs jweb.

“show chassis routing-engine” outputs like the below output normally (below is not actual output from the box with a problem and is only intended as an example), however user CPU was close to 100%

user@TESTFW02> show chassis routing-engine
node0:
--------------------------------------------------------------------------
Routing Engine status:
    Temperature                 55 degrees C / 131 degrees F
    Total memory               512 MB Max   394 MB used ( 77 percent)
      Control plane memory     336 MB Max   299 MB used ( 89 percent)
      Data plane memory        176 MB Max    95 MB used ( 54 percent)
    CPU utilization:
      User                       3 percent
      Background                 0 percent
      Kernel                     9 percent
      Interrupt                  0 percent
      Idle                      87 percent
    Model                          RE-SRX100B
    Serial ID                      XXXXXXXXXX
    Start time                     2017-04-10 02:30:18 UTC
    Uptime                         127 days, 12 hours, 1 minute, 12 seconds
    Last reboot reason             0x1000:reboot due to panic
    Load averages:                 1 minute   5 minute  15 minute
                                       0.18       0.17       0.11

node1:
--------------------------------------------------------------------------
Routing Engine status:
    Temperature                 52 degrees C / 125 degrees F
    Total memory               512 MB Max   415 MB used ( 81 percent)
      Control plane memory     336 MB Max   316 MB used ( 94 percent)
      Data plane memory        176 MB Max    97 MB used ( 55 percent)
    CPU utilization:
      User                      10 percent
      Background                 0 percent
      Kernel                    14 percent
      Interrupt                  0 percent
      Idle                      75 percent
    Model                          RE-SRX100B
    Serial ID                      XXXXXXXXXXX
    Start time                     2017-04-10 02:10:31 UTC
    Uptime                         127 days, 12 hours, 21 minutes, 1 second
    Last reboot reason             0x1000:reboot due to panic
    Load averages:                 1 minute   5 minute  15 minute
                                       0.21       0.20       0.15

To nail down the culprit, you can do the following:

user@TESTFW02> start shell
% top

Bear in mind that some platforms have a process that deliberately sits at a high CPU value in order to maintain performance (eg: flowd_octeon). Check against juniper documents before jumping to conclusions about a particular process. We are looking for something unusual and pretty obvious.

The top output should hint at the culprit. In this case it was httpd (JWEB).

We could have restarted with:

restart web-management 

However, we are managing via Junos SPACE which uses netconf so for us it was safe to disable the service:

delete groups node0 system services web-management
delete groups node1 system services web-management
delete system services web-management
commit

Exiting and checking the routing-engine state with “show chassis routing-engine” showed CPU quickly come back down to normal. Bear in mind the figures are a 1 minute rolling average by the looks of it so it will take a minute for the figure to normalise completely.

Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *