Juniper SRX High CPU

Today we came across an issue where an SRX had very high CPU usage. After a bit of digging it turned out to be the httpd process which runs jweb.

“show chassis routing-engine” outputs like the below output normally (below is not actual output from the box with a problem and is only intended as an example), however user CPU was close to 100%

user@TESTFW02> show chassis routing-engine
node0:
--------------------------------------------------------------------------
Routing Engine status:
    Temperature                 55 degrees C / 131 degrees F
    Total memory               512 MB Max   394 MB used ( 77 percent)
      Control plane memory     336 MB Max   299 MB used ( 89 percent)
      Data plane memory        176 MB Max    95 MB used ( 54 percent)
    CPU utilization:
      User                       3 percent
      Background                 0 percent
      Kernel                     9 percent
      Interrupt                  0 percent
      Idle                      87 percent
    Model                          RE-SRX100B
    Serial ID                      XXXXXXXXXX
    Start time                     2017-04-10 02:30:18 UTC
    Uptime                         127 days, 12 hours, 1 minute, 12 seconds
    Last reboot reason             0x1000:reboot due to panic
    Load averages:                 1 minute   5 minute  15 minute
                                       0.18       0.17       0.11

node1:
--------------------------------------------------------------------------
Routing Engine status:
    Temperature                 52 degrees C / 125 degrees F
    Total memory               512 MB Max   415 MB used ( 81 percent)
      Control plane memory     336 MB Max   316 MB used ( 94 percent)
      Data plane memory        176 MB Max    97 MB used ( 55 percent)
    CPU utilization:
      User                      10 percent
      Background                 0 percent
      Kernel                    14 percent
      Interrupt                  0 percent
      Idle                      75 percent
    Model                          RE-SRX100B
    Serial ID                      XXXXXXXXXXX
    Start time                     2017-04-10 02:10:31 UTC
    Uptime                         127 days, 12 hours, 21 minutes, 1 second
    Last reboot reason             0x1000:reboot due to panic
    Load averages:                 1 minute   5 minute  15 minute
                                       0.21       0.20       0.15

To nail down the culprit, you can do the following:

user@TESTFW02> start shell
% top

Bear in mind that some platforms have a process that deliberately sits at a high CPU value in order to maintain performance (eg: flowd_octeon). Check against juniper documents before jumping to conclusions about a particular process. We are looking for something unusual and pretty obvious.

The top output should hint at the culprit. In this case it was httpd (JWEB).

We could have restarted with:

restart web-management 

However, we are managing via Junos SPACE which uses netconf so for us it was safe to disable the service:

delete groups node0 system services web-management
delete groups node1 system services web-management
delete system services web-management
commit

Exiting and checking the routing-engine state with “show chassis routing-engine” showed CPU quickly come back down to normal. Bear in mind the figures are a 1 minute rolling average by the looks of it so it will take a minute for the figure to normalise completely.

Juniper MX Port Numbering

Unfortunately the juniper port numbering scheme can cause a lot of confusion for people not familiar with the layout. This can get quite confusing when explaining to remote hands on site.

I made a quick reference which I use for some of our MX480s but the principle is the same on other hardware. This is handy for sending to third parties with a port highlighted to ensure they don’t go pulling the wrong cables. :)

juniper-ports

Adding vlans to trunk on Juniper MX – behaviour

Just a small note on how the CLI on an MX behaves when you try to add vlans to an existing trunk as sometimes people are confused about doing this. Obviously it’s not really a problem as nothing happens until commit, but this works in a similar fashion to Cisco “switchport trunk allowed vlan-add xxx” command if you’re using set commands.

Existing config as below:

fe-0/0/3 {
    unit 0 {
        family bridge {
            interface-mode trunk;
            vlan-id-list [ 1 2 3 ];
        }
    }
}

Adding a vlan like this does not overwrite the existing config:

set interfaces fe-0/0/3 unit 0 family bridge vlan-id-list 5

Config is updated as below:

fe-0/0/3 {
    unit 0 {
        family bridge {
            interface-mode trunk;
            vlan-id-list [ 1 2 3 5 ];
        }
    }
}

Dead SYSIOC card in SRX1400

I had a strange issue where one of the members of an SRX cluster dropped out unexpectedly. No changes made and nothing was touched physically.

When looking on the console, the cluster status was primary but none of the physical interfaces existed, control links were down and the fxp was down too – so basically zero network connectivity.

adminuser@JCLFWL02> show chassis cluster status
Cluster ID: 1
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
    node0                   0           lost           n/a      n/a
    node1                   100         primary        no       no

Redundancy group: 1 , Failover count: 1
    node0                   0           lost           n/a      n/a
    node1                   0           primary        no       no

The logs unearthed some nasty looking messages starting with this:

Mar 15 08:31:54  JCLFWL02 (FPC Slot 1, PIC Slot 0) SPC1_PIC0 kernel: kld_map_v: 0xffffffff8c000000, kld_map_p: 0xc000000
Mar 15 08:31:54  JCLFWL02 (FPC Slot 1, PIC Slot 0) SPC1_PIC0 kernel: dog: ERROR - reset of uninitialized watchdog
Mar 15 08:31:54  JCLFWL02 (FPC Slot 1, PIC Slot 0) SPC1_PIC0 kernel: Copyright (c) 1996-2014, Juniper Networks, Inc.

show chassis hardware output indicated that FPCs were there but no PICs detected! Serial numbers removed for confidentiality.

adminuser@JCLFWL02> show chassis hardware
node1:
--------------------------------------------------------------------------
Hardware inventory:
Item             Version  Part number  Serial number     Description
Chassis                                XXXXXXXXXXXX      SRX 1400
Midplane         REV 03   711-031012   XXXXXXXX          SRX1k Backplane
PEM 0            rev 03   740-032015   XXXXXXXXXXXXX     AC Power Supply
CB 0             REV 13   750-032544   XXXXXXXX          SRX1K-RE-12-10
  Routing Engine          BUILTIN      BUILTIN           Routing Engine
  CPP                     BUILTIN      BUILTIN           Central PFE Processor
  Mezz           REV 09   710-021035   XXXXXXXX          SRX HD Mezzanine Card
FPC 0            REV 17   750-032536   XXXXXXXX          SRX1k 1GE SYSIO
FPC 1            REV 12   750-032543   XXXXXXXX          SRX1k Dual Wide NPC+SPC Support Card
FPC 3            REV 19   710-017865   XXXXXXXX          BUILTIN NPC
Fan Tray         -N/A-    -N/A-        -N/A-             SRX 1400 Fan Tray

show chassis fpc pic-status indicated much the same.

adminuser@JCLFWL02> show chassis fpc pic-status
node1:
--------------------------------------------------------------------------
Slot 0   Offline      SRX1k 1GE SYSIO
Slot 1   Offline      SRX1k Dual Wide NPC+SPC Support Card
Slot 3   Offline      BUILTIN NPC

The short story is that we tried power-off, re-seating the SYSIOC and this brought everything back for a few hours before everything died again.

Replacement of the SYSIOC was required – not an issue as the config is stored on the RE, although Juniper do have a caveat article mentioning that the control links may not come back up once this is done. They recommend to reapply the cluster node member config

Juniper KB Article Here

eg: (change cluster ID and node number as appropriate)

set chassis cluster cluster-id 1 node 1 reboot

As an additional note, it seems this card takes care of all sorts of internal communications causing some odd alarms!

adminuser@JCLFWL02> show chassis alarms
node1:
--------------------------------------------------------------------------
9 alarms currently active
Alarm time               Class  Description
2016-03-15 14:18:22 UTC  Major  FPC 3 misconfig
2016-03-15 14:18:22 UTC  Major  FPC 1 misconfig
2016-03-15 14:18:22 UTC  Major  FPC 0 misconfig
2016-03-15 14:06:32 UTC  Major  Fan Tray Failure
2016-03-15 14:06:22 UTC  Major  Muliple FANs Stuck
2016-03-15 14:06:11 UTC  Major  FPC 3 offline due to CPP disconnect
2016-03-15 14:06:11 UTC  Major  FPC 1 offline due to CPP disconnect
2016-03-15 14:06:11 UTC  Major  FPC 0 offline due to CPP disconnect
2016-03-15 14:06:07 UTC  Major  Host 0 fxp0 : Ethernet Link Down

adminuser@JCLFWL02> show chassis environment
node1:
--------------------------------------------------------------------------
Class Item                           Status     Measurement
Temp  PEM 0                          Absent
      PEM 1                          Absent
      Routing Engine 0               OK
      Routing Engine 1               Absent
      CB 0 Intake                    OK         32 degrees C / 89 degrees F
      CB 0 Exhaust A                 OK         37 degrees C / 98 degrees F
      CB 0 Mezz                      OK         34 degrees C / 93 degrees F
      FPC 0 Intake                   OK         32 degrees C / 89 degrees F
      FPC 0 Exhaust A                OK         31 degrees C / 87 degrees F
      FPC 1 Intake                   OK         28 degrees C / 82 degrees F
      FPC 1 Exhaust A                OK         28 degrees C / 82 degrees F
      FPC 1 XLR                      Testing
      FPC 3 Intake                   OK         28 degrees C / 82 degrees F
      FPC 3 Exhaust A                OK         29 degrees C / 84 degrees F
Fans  Fan 1                          Check
      Fan 2                          Check

Another side note – I wondered why one control link had been installed as fibre and another had been installed as copper, and came across this from Juniper:

NOTE: When you use ge-0/0/11 as a control port, you must use a fiber SFP transceiver, but you can use copper or fiber SFP transceiver on ge-0/0/10.

SPAN Port on Juniper MX Series

Unfortunately creating a SPAN port on a Juniper MX isn’t as easy as on Cisco kit or even, say an SRX. You need to jump through a few hoops creating a forwarding-options config, a firewall filter and also a bit of a kludge with the SPAN interface by creating a static ARP entry to force traffic out.

Here are some example for IOS/NX-OS and so you can see the difference.

Cisco IOS

monitor session 1 source interface Gi0/13 both
monitor session 1 destination interface Gi0/24

Cisco NX-OS

monitor session 1
  source interface Eth10/34
  destination interface Ethernet10/35
  no shut

interface Eth10/35
 switchport
 switchport monitor

Nice and simple. But not on JunOS. :(

All credit and thanks to this post which made it very easy to understand:
http://pingpros.blogspot.nl/2012/12/multiple-ports-port-mirror-on-juniper.html

Source ports in this example are ge-5/2/7 for the port to be mirrored, and xe-4/3/0 for the port that connects to the wireshark or other monitoring device.

1) Setup the port forwarding option.

set forwarding-options port-mirroring input rate 1
set forwarding-options port-mirroring input run-length 1
set forwarding-options port-mirroring family inet output interface xe-4/3/0.0 next-hop 1.1.1.2
set forwarding-options port-mirroring family inet output no-filter-check

2) Create a firewall filter which will mirror the port traffic. I presume term 2 is required so it still allows traffic through as well as port-mirroring.

set firewall family inet filter port-mirror term 1 then port-mirror
set firewall family inet filter port-mirror term 1 then accept
set firewall family inet filter port-mirror term 2 then accept

3) Apply the firewall filter to the port or ports that you want to mirror.

set interfaces ge-5/2/7 unit 0 family inet filter input port-mirror
set interfaces ge-5/2/7 unit 0 family inet filter output port-mirror

4) Configure the SPAN interface with an IP that doesn’t conflict with anything you’re already using within your network and add a dummy arp entry for the next-hop address so traffic is forced out of the interface. Remember to remove any other configuration on this interface beforehand if re-using say, an access port. The MAC address is fictional.

set interfaces xe-4/3/0 unit 0 family inet address 1.1.1.1/30 arp 1.1.1.2 mac 00:11:22:33:44:55

Note that you can add the same config to an existing irb interface to SPAN an irb. This is less painful than trying to do pure L2 span when it’s applicable.

set interfaces irb unit 900 family inet filter input port-mirror
set interfaces irb unit 900 family inet filter output port-mirror

Job done.

UPDATE: It seems Juniper has added “analyzer” functionality in more recent code. I’ll investigate this at a later date.