Juniper SRX High CPU

Today we came across an issue where an SRX had very high CPU usage. After a bit of digging it turned out to be the httpd process which runs jweb.

“show chassis routing-engine” outputs like the below output normally (below is not actual output from the box with a problem and is only intended as an example), however user CPU was close to 100%

user@TESTFW02> show chassis routing-engine
node0:
--------------------------------------------------------------------------
Routing Engine status:
    Temperature                 55 degrees C / 131 degrees F
    Total memory               512 MB Max   394 MB used ( 77 percent)
      Control plane memory     336 MB Max   299 MB used ( 89 percent)
      Data plane memory        176 MB Max    95 MB used ( 54 percent)
    CPU utilization:
      User                       3 percent
      Background                 0 percent
      Kernel                     9 percent
      Interrupt                  0 percent
      Idle                      87 percent
    Model                          RE-SRX100B
    Serial ID                      XXXXXXXXXX
    Start time                     2017-04-10 02:30:18 UTC
    Uptime                         127 days, 12 hours, 1 minute, 12 seconds
    Last reboot reason             0x1000:reboot due to panic
    Load averages:                 1 minute   5 minute  15 minute
                                       0.18       0.17       0.11

node1:
--------------------------------------------------------------------------
Routing Engine status:
    Temperature                 52 degrees C / 125 degrees F
    Total memory               512 MB Max   415 MB used ( 81 percent)
      Control plane memory     336 MB Max   316 MB used ( 94 percent)
      Data plane memory        176 MB Max    97 MB used ( 55 percent)
    CPU utilization:
      User                      10 percent
      Background                 0 percent
      Kernel                    14 percent
      Interrupt                  0 percent
      Idle                      75 percent
    Model                          RE-SRX100B
    Serial ID                      XXXXXXXXXXX
    Start time                     2017-04-10 02:10:31 UTC
    Uptime                         127 days, 12 hours, 21 minutes, 1 second
    Last reboot reason             0x1000:reboot due to panic
    Load averages:                 1 minute   5 minute  15 minute
                                       0.21       0.20       0.15

To nail down the culprit, you can do the following:

user@TESTFW02> start shell
% top

Bear in mind that some platforms have a process that deliberately sits at a high CPU value in order to maintain performance (eg: flowd_octeon). Check against juniper documents before jumping to conclusions about a particular process. We are looking for something unusual and pretty obvious.

The top output should hint at the culprit. In this case it was httpd (JWEB).

We could have restarted with:

restart web-management 

However, we are managing via Junos SPACE which uses netconf so for us it was safe to disable the service:

delete groups node0 system services web-management
delete groups node1 system services web-management
delete system services web-management
commit

Exiting and checking the routing-engine state with “show chassis routing-engine” showed CPU quickly come back down to normal. Bear in mind the figures are a 1 minute rolling average by the looks of it so it will take a minute for the figure to normalise completely.

Do NOT install HP OMI agent v12 on NNMi9 Servers!

As per title, do NOT install/upgrade OMI agent on NNMi boxes to v12, as this will overwrite files in the /opt/OV/nonOV/perl directory. NNMi9 relies on Per 5.8.8 however 5.16.0 files will be installed when the OMI agent is installed.

Likely the first issues you will see will be NNMi self monitoring alerts, and then you’ll notice that you can’t run any of the .ovpl commands any more, getting a message like this:

Can't locate OVNNMvars.pm in @INC (@INC contains: /opt/OV/nonOV/perl/a/lib/site_perl/5.16.0/x86_64-linux-thread-multi /opt/OV/nonOV/perl/a/lib/site_perl/5.16.0 /opt/OV/nonOV/perl/a/lib/5.16.0/x86_64-linux-thread-multi /opt/OV/nonOV/perl/a/lib/5.16.0 /opt/OV/nonOV/perl/a/lib/site_perl/5.16.0/x86_64-linux-thread-multi /opt/OV/nonOV/perl/a/lib/site_perl/5.16.0 /opt/OV/nonOV/perl/a/lib/site_perl .) at /opt/OV/bin/nnmversion.ovpl line 19.
 BEGIN failed--compilation aborted at /opt/OV/bin/nnmversion.ovpl line 19.

It wasn’t fun getting back to a known state but we managed in the end by looking at what files the agent had installed/modified and restoring backups. Avoid the pain by not getting into the situation in the first place – also make sure you’re not just backing up data files but binaries as well.

The latest version of NNM10 uses Perl 5.16.0 I believe so this shouldn’t be an issue. I’d personally still avoid installing any product that uses the same file structure.

Updating TACACs on an older Wildpackets Omnipliance

I had to update the TACACs server details on an old OmniPliance which proved to be quite confusing. I found the location of the settings but each time we restarted the service it reverted back to the way it was.

Quite a simple solution in the end… The procedure is below.

$ ssh root@omnipliance1
# service omnid stop
# vi /etc/omni/engineconfig.xml

Edit the following line in engineconfig.xml

             

Quit with :wq! and restart the service.

# service omnid start

Cisco ASA denying traffic between interfaces despite correct policy

After a bit of head-scratching and frustration, I found an issue on a Cisco ASA (v9.1) with the configuration that has caught me out twice previously. I felt like a bit of an idiot for not clicking sooner.

Basically the policy was allowing “icmp any any” both sides along with the connection traffic from the specific external to internal interfaces, however the logs were showing denials:

Inbound TCP connection denied from A.B.C.D/53112 to W.X.Y.Z/443 flags SYN  on interface Ext-2
Deny inbound icmp src Ext-2:A.B.C.D dst Ext-1:W.X.Y.Z (type 8, code 0)

The issue here was that both interfaces were configured with the same security-level. This is a hangover from the old PIX days where this kind of communication was denied by default. The policy doesn’t seem to matter.

The fix in this case was that the security-level on the “dirty” interface was lowered to a more appropriate value. To allow ACLs to control communications between same-level interfaces, the following command can be used:

same-security-traffic permit inter-interface

Juniper MX Port Numbering

Unfortunately the juniper port numbering scheme can cause a lot of confusion for people not familiar with the layout. This can get quite confusing when explaining to remote hands on site.

I made a quick reference which I use for some of our MX480s but the principle is the same on other hardware. This is handy for sending to third parties with a port highlighted to ensure they don’t go pulling the wrong cables. :)

juniper-ports