Dead SYSIOC card in SRX1400

I had a strange issue where one of the members of an SRX cluster dropped out unexpectedly. No changes made and nothing was touched physically.

When looking on the console, the cluster status was primary but none of the physical interfaces existed, control links were down and the fxp was down too – so basically zero network connectivity.

adminuser@JCLFWL02> show chassis cluster status
Cluster ID: 1
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
    node0                   0           lost           n/a      n/a
    node1                   100         primary        no       no

Redundancy group: 1 , Failover count: 1
    node0                   0           lost           n/a      n/a
    node1                   0           primary        no       no

The logs unearthed some nasty looking messages starting with this:

Mar 15 08:31:54  JCLFWL02 (FPC Slot 1, PIC Slot 0) SPC1_PIC0 kernel: kld_map_v: 0xffffffff8c000000, kld_map_p: 0xc000000
Mar 15 08:31:54  JCLFWL02 (FPC Slot 1, PIC Slot 0) SPC1_PIC0 kernel: dog: ERROR - reset of uninitialized watchdog
Mar 15 08:31:54  JCLFWL02 (FPC Slot 1, PIC Slot 0) SPC1_PIC0 kernel: Copyright (c) 1996-2014, Juniper Networks, Inc.

show chassis hardware output indicated that FPCs were there but no PICs detected! Serial numbers removed for confidentiality.

adminuser@JCLFWL02> show chassis hardware
node1:
--------------------------------------------------------------------------
Hardware inventory:
Item             Version  Part number  Serial number     Description
Chassis                                XXXXXXXXXXXX      SRX 1400
Midplane         REV 03   711-031012   XXXXXXXX          SRX1k Backplane
PEM 0            rev 03   740-032015   XXXXXXXXXXXXX     AC Power Supply
CB 0             REV 13   750-032544   XXXXXXXX          SRX1K-RE-12-10
  Routing Engine          BUILTIN      BUILTIN           Routing Engine
  CPP                     BUILTIN      BUILTIN           Central PFE Processor
  Mezz           REV 09   710-021035   XXXXXXXX          SRX HD Mezzanine Card
FPC 0            REV 17   750-032536   XXXXXXXX          SRX1k 1GE SYSIO
FPC 1            REV 12   750-032543   XXXXXXXX          SRX1k Dual Wide NPC+SPC Support Card
FPC 3            REV 19   710-017865   XXXXXXXX          BUILTIN NPC
Fan Tray         -N/A-    -N/A-        -N/A-             SRX 1400 Fan Tray

show chassis fpc pic-status indicated much the same.

adminuser@JCLFWL02> show chassis fpc pic-status
node1:
--------------------------------------------------------------------------
Slot 0   Offline      SRX1k 1GE SYSIO
Slot 1   Offline      SRX1k Dual Wide NPC+SPC Support Card
Slot 3   Offline      BUILTIN NPC

The short story is that we tried power-off, re-seating the SYSIOC and this brought everything back for a few hours before everything died again.

Replacement of the SYSIOC was required – not an issue as the config is stored on the RE, although Juniper do have a caveat article mentioning that the control links may not come back up once this is done. They recommend to reapply the cluster node member config

Juniper KB Article Here

eg: (change cluster ID and node number as appropriate)

set chassis cluster cluster-id 1 node 1 reboot

As an additional note, it seems this card takes care of all sorts of internal communications causing some odd alarms!

adminuser@JCLFWL02> show chassis alarms
node1:
--------------------------------------------------------------------------
9 alarms currently active
Alarm time               Class  Description
2016-03-15 14:18:22 UTC  Major  FPC 3 misconfig
2016-03-15 14:18:22 UTC  Major  FPC 1 misconfig
2016-03-15 14:18:22 UTC  Major  FPC 0 misconfig
2016-03-15 14:06:32 UTC  Major  Fan Tray Failure
2016-03-15 14:06:22 UTC  Major  Muliple FANs Stuck
2016-03-15 14:06:11 UTC  Major  FPC 3 offline due to CPP disconnect
2016-03-15 14:06:11 UTC  Major  FPC 1 offline due to CPP disconnect
2016-03-15 14:06:11 UTC  Major  FPC 0 offline due to CPP disconnect
2016-03-15 14:06:07 UTC  Major  Host 0 fxp0 : Ethernet Link Down

adminuser@JCLFWL02> show chassis environment
node1:
--------------------------------------------------------------------------
Class Item                           Status     Measurement
Temp  PEM 0                          Absent
      PEM 1                          Absent
      Routing Engine 0               OK
      Routing Engine 1               Absent
      CB 0 Intake                    OK         32 degrees C / 89 degrees F
      CB 0 Exhaust A                 OK         37 degrees C / 98 degrees F
      CB 0 Mezz                      OK         34 degrees C / 93 degrees F
      FPC 0 Intake                   OK         32 degrees C / 89 degrees F
      FPC 0 Exhaust A                OK         31 degrees C / 87 degrees F
      FPC 1 Intake                   OK         28 degrees C / 82 degrees F
      FPC 1 Exhaust A                OK         28 degrees C / 82 degrees F
      FPC 1 XLR                      Testing
      FPC 3 Intake                   OK         28 degrees C / 82 degrees F
      FPC 3 Exhaust A                OK         29 degrees C / 84 degrees F
Fans  Fan 1                          Check
      Fan 2                          Check

Another side note – I wondered why one control link had been installed as fibre and another had been installed as copper, and came across this from Juniper:

NOTE: When you use ge-0/0/11 as a control port, you must use a fiber SFP transceiver, but you can use copper or fiber SFP transceiver on ge-0/0/10.

SPAN Port on Juniper MX Series

Unfortunately creating a SPAN port on a Juniper MX isn’t as easy as on Cisco kit or even, say an SRX. You need to jump through a few hoops creating a forwarding-options config, a firewall filter and also a bit of a kludge with the SPAN interface by creating a static ARP entry to force traffic out.

Here are some example for IOS/NX-OS and so you can see the difference.

Cisco IOS

monitor session 1 source interface Gi0/13 both
monitor session 1 destination interface Gi0/24

Cisco NX-OS

monitor session 1
  source interface Eth10/34
  destination interface Ethernet10/35
  no shut

interface Eth10/35
 switchport
 switchport monitor

Nice and simple. But not on JunOS. :(

All credit and thanks to this post which made it very easy to understand:
http://pingpros.blogspot.nl/2012/12/multiple-ports-port-mirror-on-juniper.html

Source ports in this example are ge-5/2/7 for the port to be mirrored, and xe-4/3/0 for the port that connects to the wireshark or other monitoring device.

1) Setup the port forwarding option.

set forwarding-options port-mirroring input rate 1
set forwarding-options port-mirroring input run-length 1
set forwarding-options port-mirroring family inet output interface xe-4/3/0.0 next-hop 1.1.1.2
set forwarding-options port-mirroring family inet output no-filter-check

2) Create a firewall filter which will mirror the port traffic. I presume term 2 is required so it still allows traffic through as well as port-mirroring.

set firewall family inet filter port-mirror term 1 then port-mirror
set firewall family inet filter port-mirror term 1 then accept
set firewall family inet filter port-mirror term 2 then accept

3) Apply the firewall filter to the port or ports that you want to mirror.

set interfaces ge-5/2/7 unit 0 family inet filter input port-mirror
set interfaces ge-5/2/7 unit 0 family inet filter output port-mirror

4) Configure the SPAN interface with an IP that doesn’t conflict with anything you’re already using within your network and add a dummy arp entry for the next-hop address so traffic is forced out of the interface. Remember to remove any other configuration on this interface beforehand if re-using say, an access port. The MAC address is fictional.

set interfaces xe-4/3/0 unit 0 family inet address 1.1.1.1/30 arp 1.1.1.2 mac 00:11:22:33:44:55

Note that you can add the same config to an existing irb interface to SPAN an irb. This is less painful than trying to do pure L2 span when it’s applicable.

set interfaces irb unit 900 family inet filter input port-mirror
set interfaces irb unit 900 family inet filter output port-mirror

Job done.

UPDATE: It seems Juniper has added “analyzer” functionality in more recent code. I’ll investigate this at a later date.