F5 LTM Single VS to multiple pools port mapping

Scenario: 1 to 1 mapping of ports on an IP for SSL termination to a corresponding inside port on a local server.

Rather than creating a VS on the same IP for each individual port I decided to create the pools containing the same node but with individual ports and manage the VS part with an iRule.

Eg:

pool0 – 192.168.1.100:45000
pool1 – 192.168.1.100:45001
pool2 – 192.168.1.100:45002

An iRule was then created on a VS listening on all ports to redirect as required.

Source: 0.0.0.0/0
SSL Profile (Client): [ your SSL profile ]
VLAN and Tunnel Traffic: Enabled on… [ appropriate interface ]
Source Address Translation: Auto Map
Address Translation: [ Should be ticked ]
Port Transation: Tick
Resources: [ Pick the iRule ]

iRule:

when CLIENT_ACCEPTED {
    switch [TCP::local_port] {
        "5000" { pool pool0 }
        "5001" { pool pool1 } 
        "5002" { pool pool2 } 
        default { reject }
        }
}

Seems to work OK!

NB: I found a gotcha here as I was replacing an existing VS with a specific port. If you have a VS for a specific port and shut it down, then create a VS on the same IP listening on all ports, incoming connections to the port on the shutdown VS will be denied! You can get around this by changing the original VS to an unused port to allow for reverting or just delete it.

X11 forwarding over SSH on firewalled CentOS host

I had a few issues with X11 forwarding over SSH on one of my CentOS hosts. After a bit of fiddling, I discovered that there were a couple of things I hadn’t taken into account.

I’d set my putty session up to allow X11 fowarding, and set the X display location to “localhost”. On the server, I installed xclock and its dependencies for testing, and set the following in /etc/ssh/sshd_config:

X11Forwarding yes
X11DisplayOffset 10
X11UseLocalhost yes

I restarted sshd, however this still wasn’t working.

In short, I was missing two things:

1) xauth wasn’t installed. This is required!
2) I wasn’t allowing connections to localhost in my iptables config. This was fixed in my ruleset with:

iptables -A INPUT -i lo -j ACCEPT

sshd was restarted after installing xauth and adding the firewall rule and it now works a treat!

Nexus FEX Bouncing

I came across an odd problem where a FEX was bouncing and was asked to look at it. The logs were a flood of interfaces going up and down and FEX status messages, however buried in amongst the logs and quite easy to miss was the following, less frequent syslog message:

%SATCTRL-FEX132-2-SATCTRL_FEX_MISCONFIG: FEX-132 is being configured as 131 on different switch

Pretty obvious clue there. Configuration was correct for the uplinks on both 5Ks:

interface Ethernet1/13
  switchport mode fex-fabric
  fex associate 131
  channel-group 131

interface Ethernet1/14
  switchport mode fex-fabric
  fex associate 132
  channel-group 132

Checking the serial numbers of the attached FEXes confirmed the problem:


First 5K

FEX: 131 Description: FEX213 - CAB 28   state: Offline
  FEX version: 7.1(3)N1(1) [Switch version: 7.1(3)N1(1)]
  FEX Interim version: 7.1(3)N1(1)
  Switch Interim version: 7.1(3)N1(1)
  Extender Serial: FOC00011122

FEX: 132 Description: FEX214 - CAB 28   state: Online
  FEX version: 7.1(3)N1(1) [Switch version: 7.1(3)N1(1)]
  FEX Interim version: 7.1(3)N1(1)
  Switch Interim version: 7.1(3)N1(1)
  Extender Serial: FOC12345678

Second 5K


FEX: 131 Description: FEX213 - CAB 28   state: Registered
  FEX version: 7.1(3)N1(1) [Switch version: 7.1(3)N1(1)]
  FEX Interim version: 7.1(3)N1(1)
  Switch Interim version: 7.1(3)N1(1)

FEX: 132 Description: FEX214 - CAB 28   state: Online
  FEX version: 7.1(3)N1(1) [Switch version: 7.1(3)N1(1)]
  FEX Interim version: 7.1(3)N1(1)
  Switch Interim version: 7.1(3)N1(1)
  Extender Serial: FOC00011122

As we can see above, the same FEX is associated with FEX131 on the first 5K and FEX132 on the second 5K. The solution was to verify which serial number was which FEX in the cabinets and to swap the cables for the two ports around on the incorrectly patched 5K. Looks like someone had been doing some patching and put things back in the wrong way around! O_o

F5 HA Device Group Woes

HA Device Groups. Great when they work, but I’m not sure why F5 have to make setting this up such a headache. There’s something lacking in the QA department when setting up a device group for HA causes so many problems for so many people.

This can be immensely frustrating and can lead to hair loss so here’s my checklist to get things working.

Prerequisites:

- NTP must be sync'd or things won't work. (verify with ntpstat from command line)
- Make backups of the current configs before trying to fix things!

Caveats:

- When things aren't right, the below can cause unexpected failovers - arrange an outage window!

Procedure:

i)   Add HA (Failover) VLAN on (both boxes) and assign to interface(s)
ii)  Add HA (Failover) Self IPs (both boxes) - Make sure Port Lockdown is "Allow Default"
iii) Add peer with Device Trust -> Peer List -> Add (both boxes)
iii) On each box, under Devices (self):
          - Set Device Connectivity -> ConfigSync -> Failover LAN IP
          - Set Device Connectivity -> Network Failover -> Failover LAN IP (and also MGMT if desired)
          - Set Device Connectivity -> Connection Mirroring -> Failover LAN IP (and secondary if desired)
iv) Create a Device group with Device Groups -> Create (eg: FAILOVER-GROUP) (both boxes) 
    with type sync-failover (not automatic sync)
v)  Add peer to device group (Select automatic sync if required)
vi) Sync to peer(s) – select “overwrite config”

If this fails and devices still say disconnected, then:

i)   Reset Device Trust on both boxes 
           - Device Management > Device Trust > Reset Device Trust [retain current authority]
ii)  REBOOT the box with full_box_reboot at command line (obviously needs an outage window if live)
iii) Delete the peers on both boxes, and if necessary, clear out any membership of 
        device groups. Also delete the device group.
iv)  Start from/verify step iii in the first section above again.

Don’t forget to set up your routes before setting up nodes that aren’t locally attached to your external LAN. ;)

As an aside, the latest problem was with the secondary which had gone a bit nuts and had a weird error message when trying to sync, something along the lines of “couldn’t load node 71….etc”. We decided to reset it to defaults with tmsh load /sys config default before doing the above.

F5 KB article for restoring default config:

https://support.f5.com/kb/en-us/solutions/public/13000/100/sol13127.html

Dead SYSIOC card in SRX1400

I had a strange issue where one of the members of an SRX cluster dropped out unexpectedly. No changes made and nothing was touched physically.

When looking on the console, the cluster status was primary but none of the physical interfaces existed, control links were down and the fxp was down too – so basically zero network connectivity.

adminuser@JCLFWL02> show chassis cluster status
Cluster ID: 1
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
    node0                   0           lost           n/a      n/a
    node1                   100         primary        no       no

Redundancy group: 1 , Failover count: 1
    node0                   0           lost           n/a      n/a
    node1                   0           primary        no       no

The logs unearthed some nasty looking messages starting with this:

Mar 15 08:31:54  JCLFWL02 (FPC Slot 1, PIC Slot 0) SPC1_PIC0 kernel: kld_map_v: 0xffffffff8c000000, kld_map_p: 0xc000000
Mar 15 08:31:54  JCLFWL02 (FPC Slot 1, PIC Slot 0) SPC1_PIC0 kernel: dog: ERROR - reset of uninitialized watchdog
Mar 15 08:31:54  JCLFWL02 (FPC Slot 1, PIC Slot 0) SPC1_PIC0 kernel: Copyright (c) 1996-2014, Juniper Networks, Inc.

show chassis hardware output indicated that FPCs were there but no PICs detected! Serial numbers removed for confidentiality.

adminuser@JCLFWL02> show chassis hardware
node1:
--------------------------------------------------------------------------
Hardware inventory:
Item             Version  Part number  Serial number     Description
Chassis                                XXXXXXXXXXXX      SRX 1400
Midplane         REV 03   711-031012   XXXXXXXX          SRX1k Backplane
PEM 0            rev 03   740-032015   XXXXXXXXXXXXX     AC Power Supply
CB 0             REV 13   750-032544   XXXXXXXX          SRX1K-RE-12-10
  Routing Engine          BUILTIN      BUILTIN           Routing Engine
  CPP                     BUILTIN      BUILTIN           Central PFE Processor
  Mezz           REV 09   710-021035   XXXXXXXX          SRX HD Mezzanine Card
FPC 0            REV 17   750-032536   XXXXXXXX          SRX1k 1GE SYSIO
FPC 1            REV 12   750-032543   XXXXXXXX          SRX1k Dual Wide NPC+SPC Support Card
FPC 3            REV 19   710-017865   XXXXXXXX          BUILTIN NPC
Fan Tray         -N/A-    -N/A-        -N/A-             SRX 1400 Fan Tray

show chassis fpc pic-status indicated much the same.

adminuser@JCLFWL02> show chassis fpc pic-status
node1:
--------------------------------------------------------------------------
Slot 0   Offline      SRX1k 1GE SYSIO
Slot 1   Offline      SRX1k Dual Wide NPC+SPC Support Card
Slot 3   Offline      BUILTIN NPC

The short story is that we tried power-off, re-seating the SYSIOC and this brought everything back for a few hours before everything died again.

Replacement of the SYSIOC was required – not an issue as the config is stored on the RE, although Juniper do have a caveat article mentioning that the control links may not come back up once this is done. They recommend to reapply the cluster node member config

Juniper KB Article Here

eg: (change cluster ID and node number as appropriate)

set chassis cluster cluster-id 1 node 1 reboot

As an additional note, it seems this card takes care of all sorts of internal communications causing some odd alarms!

adminuser@JCLFWL02> show chassis alarms
node1:
--------------------------------------------------------------------------
9 alarms currently active
Alarm time               Class  Description
2016-03-15 14:18:22 UTC  Major  FPC 3 misconfig
2016-03-15 14:18:22 UTC  Major  FPC 1 misconfig
2016-03-15 14:18:22 UTC  Major  FPC 0 misconfig
2016-03-15 14:06:32 UTC  Major  Fan Tray Failure
2016-03-15 14:06:22 UTC  Major  Muliple FANs Stuck
2016-03-15 14:06:11 UTC  Major  FPC 3 offline due to CPP disconnect
2016-03-15 14:06:11 UTC  Major  FPC 1 offline due to CPP disconnect
2016-03-15 14:06:11 UTC  Major  FPC 0 offline due to CPP disconnect
2016-03-15 14:06:07 UTC  Major  Host 0 fxp0 : Ethernet Link Down

adminuser@JCLFWL02> show chassis environment
node1:
--------------------------------------------------------------------------
Class Item                           Status     Measurement
Temp  PEM 0                          Absent
      PEM 1                          Absent
      Routing Engine 0               OK
      Routing Engine 1               Absent
      CB 0 Intake                    OK         32 degrees C / 89 degrees F
      CB 0 Exhaust A                 OK         37 degrees C / 98 degrees F
      CB 0 Mezz                      OK         34 degrees C / 93 degrees F
      FPC 0 Intake                   OK         32 degrees C / 89 degrees F
      FPC 0 Exhaust A                OK         31 degrees C / 87 degrees F
      FPC 1 Intake                   OK         28 degrees C / 82 degrees F
      FPC 1 Exhaust A                OK         28 degrees C / 82 degrees F
      FPC 1 XLR                      Testing
      FPC 3 Intake                   OK         28 degrees C / 82 degrees F
      FPC 3 Exhaust A                OK         29 degrees C / 84 degrees F
Fans  Fan 1                          Check
      Fan 2                          Check

Another side note – I wondered why one control link had been installed as fibre and another had been installed as copper, and came across this from Juniper:

NOTE: When you use ge-0/0/11 as a control port, you must use a fiber SFP transceiver, but you can use copper or fiber SFP transceiver on ge-0/0/10.