Policy tracing from CLI on firewalls.

This post is just a quick reference on how to see whether or not some traffic will pass the rulebase or not.

Tracing policy matching on Juniper SRX (for actual live sessions use ‘show security-flow session’)

show security match-policies from-zone trust to-zone untrust source-port 1024 destination-port 40961 protocol tcp source-ip 10.243.0.1 destination-ip 10.243.15.12

Tracing policy matching on Cisco ASA (for live connections, use ‘show conn’)

packet-tracer input inside tcp 10.0.0.1 1024 4.2.2.1 443 [detailed]

Tracing policy matching on Palo Alto

test security-policy-match application twitter-posting source-user acme\mcanha destination 199.59.150.7 destination-port 80 source 10.40.14.197 protocol 6 

NB: protocol 6 = tcp, 17= UDP, 1 = ICMP
see https://www.iana.org/assignments/protocol-numbers/protocol-numbers.xhtml

Cisco ASA denying traffic between interfaces despite correct policy

After a bit of head-scratching and frustration, I found an issue on a Cisco ASA (v9.1) with the configuration that has caught me out twice previously. I felt like a bit of an idiot for not clicking sooner.

Basically the policy was allowing “icmp any any” both sides along with the connection traffic from the specific external to internal interfaces, however the logs were showing denials:

Inbound TCP connection denied from A.B.C.D/53112 to W.X.Y.Z/443 flags SYN  on interface Ext-2
Deny inbound icmp src Ext-2:A.B.C.D dst Ext-1:W.X.Y.Z (type 8, code 0)

The issue here was that both interfaces were configured with the same security-level. This is a hangover from the old PIX days where this kind of communication was denied by default. The policy doesn’t seem to matter.

The fix in this case was that the security-level on the “dirty” interface was lowered to a more appropriate value. To allow ACLs to control communications between same-level interfaces, the following command can be used:

same-security-traffic permit inter-interface

Dead SYSIOC card in SRX1400

I had a strange issue where one of the members of an SRX cluster dropped out unexpectedly. No changes made and nothing was touched physically.

When looking on the console, the cluster status was primary but none of the physical interfaces existed, control links were down and the fxp was down too – so basically zero network connectivity.

adminuser@JCLFWL02> show chassis cluster status
Cluster ID: 1
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
    node0                   0           lost           n/a      n/a
    node1                   100         primary        no       no

Redundancy group: 1 , Failover count: 1
    node0                   0           lost           n/a      n/a
    node1                   0           primary        no       no

The logs unearthed some nasty looking messages starting with this:

Mar 15 08:31:54  JCLFWL02 (FPC Slot 1, PIC Slot 0) SPC1_PIC0 kernel: kld_map_v: 0xffffffff8c000000, kld_map_p: 0xc000000
Mar 15 08:31:54  JCLFWL02 (FPC Slot 1, PIC Slot 0) SPC1_PIC0 kernel: dog: ERROR - reset of uninitialized watchdog
Mar 15 08:31:54  JCLFWL02 (FPC Slot 1, PIC Slot 0) SPC1_PIC0 kernel: Copyright (c) 1996-2014, Juniper Networks, Inc.

show chassis hardware output indicated that FPCs were there but no PICs detected! Serial numbers removed for confidentiality.

adminuser@JCLFWL02> show chassis hardware
node1:
--------------------------------------------------------------------------
Hardware inventory:
Item             Version  Part number  Serial number     Description
Chassis                                XXXXXXXXXXXX      SRX 1400
Midplane         REV 03   711-031012   XXXXXXXX          SRX1k Backplane
PEM 0            rev 03   740-032015   XXXXXXXXXXXXX     AC Power Supply
CB 0             REV 13   750-032544   XXXXXXXX          SRX1K-RE-12-10
  Routing Engine          BUILTIN      BUILTIN           Routing Engine
  CPP                     BUILTIN      BUILTIN           Central PFE Processor
  Mezz           REV 09   710-021035   XXXXXXXX          SRX HD Mezzanine Card
FPC 0            REV 17   750-032536   XXXXXXXX          SRX1k 1GE SYSIO
FPC 1            REV 12   750-032543   XXXXXXXX          SRX1k Dual Wide NPC+SPC Support Card
FPC 3            REV 19   710-017865   XXXXXXXX          BUILTIN NPC
Fan Tray         -N/A-    -N/A-        -N/A-             SRX 1400 Fan Tray

show chassis fpc pic-status indicated much the same.

adminuser@JCLFWL02> show chassis fpc pic-status
node1:
--------------------------------------------------------------------------
Slot 0   Offline      SRX1k 1GE SYSIO
Slot 1   Offline      SRX1k Dual Wide NPC+SPC Support Card
Slot 3   Offline      BUILTIN NPC

The short story is that we tried power-off, re-seating the SYSIOC and this brought everything back for a few hours before everything died again.

Replacement of the SYSIOC was required – not an issue as the config is stored on the RE, although Juniper do have a caveat article mentioning that the control links may not come back up once this is done. They recommend to reapply the cluster node member config

Juniper KB Article Here

eg: (change cluster ID and node number as appropriate)

set chassis cluster cluster-id 1 node 1 reboot

As an additional note, it seems this card takes care of all sorts of internal communications causing some odd alarms!

adminuser@JCLFWL02> show chassis alarms
node1:
--------------------------------------------------------------------------
9 alarms currently active
Alarm time               Class  Description
2016-03-15 14:18:22 UTC  Major  FPC 3 misconfig
2016-03-15 14:18:22 UTC  Major  FPC 1 misconfig
2016-03-15 14:18:22 UTC  Major  FPC 0 misconfig
2016-03-15 14:06:32 UTC  Major  Fan Tray Failure
2016-03-15 14:06:22 UTC  Major  Muliple FANs Stuck
2016-03-15 14:06:11 UTC  Major  FPC 3 offline due to CPP disconnect
2016-03-15 14:06:11 UTC  Major  FPC 1 offline due to CPP disconnect
2016-03-15 14:06:11 UTC  Major  FPC 0 offline due to CPP disconnect
2016-03-15 14:06:07 UTC  Major  Host 0 fxp0 : Ethernet Link Down

adminuser@JCLFWL02> show chassis environment
node1:
--------------------------------------------------------------------------
Class Item                           Status     Measurement
Temp  PEM 0                          Absent
      PEM 1                          Absent
      Routing Engine 0               OK
      Routing Engine 1               Absent
      CB 0 Intake                    OK         32 degrees C / 89 degrees F
      CB 0 Exhaust A                 OK         37 degrees C / 98 degrees F
      CB 0 Mezz                      OK         34 degrees C / 93 degrees F
      FPC 0 Intake                   OK         32 degrees C / 89 degrees F
      FPC 0 Exhaust A                OK         31 degrees C / 87 degrees F
      FPC 1 Intake                   OK         28 degrees C / 82 degrees F
      FPC 1 Exhaust A                OK         28 degrees C / 82 degrees F
      FPC 1 XLR                      Testing
      FPC 3 Intake                   OK         28 degrees C / 82 degrees F
      FPC 3 Exhaust A                OK         29 degrees C / 84 degrees F
Fans  Fan 1                          Check
      Fan 2                          Check

Another side note – I wondered why one control link had been installed as fibre and another had been installed as copper, and came across this from Juniper:

NOTE: When you use ge-0/0/11 as a control port, you must use a fiber SFP transceiver, but you can use copper or fiber SFP transceiver on ge-0/0/10.

Checkpoint Connection Limit Woes

It’s been a while since I posted here so I thought I’d share something that’s been driving me absolutely insane for over a month at work.

We had episodes where the checkpoint connection table on one of our internet stack firewalls was getting maxed out, and tracking it down proved to be extremely difficult. Dumping the checkpoint firewall connections table and a quick bit of analysis:

fw tab –t –connections –f –u | awk ‘{print $9”,”$11”,”$13”,”$15”,”$43}’ > /tmp/connections.txt

Summary - top 20 sources
awk -F"," '{print $1}' /tmp/connections.txt | sort -n | uniq -c | sort -rn | head -20

Summary - top 20 destinations
awk -F"," '{print $3}' /tmp/connections.txt | sort -n | uniq -c | sort -rn | head -20

…didn’t really yield anything interesting and the times at which the issue was happening were completely random. Increasing the connections table limit just moved the problem further up the stack to the perimiter firewall! The top sources were the proxies which is to be expected. Without access to proxy logs this was also a pain (when you’re in a big organisation, you can’t just jump on to their kit and take a look, sadly).

In the end we decided to create new objects for http/https proxy, http and https etc for our Proxy traffic rule and set their timeouts low (10 mins). When we graphed the connections table we noticed that the spikes timed out after the low timeout we specified, proving beyond all doubt that the issue was either user or system based but only for those clients set to use the proxies.

After this I set up reports on our netflow collector to get some stats on traffic hitting the proxies and did a bit of digging via awk to find the top destination IPs – nothing out of the ordinary, certainly a lot of google traffic but that must be legit, right? So, I turned it around and looked at Client IPs to get a clue. We had to use realtime graphing on the Checkpoint to pick out exactly when the spikes were occuring so we could investigate netflow within a 1 minute window, otherwise it was like looking for a needle in a haystack.

In the end, and to cut a long story short, we found that some users had installed Google Chrome on their development PCs. For some reason, it seems that Google Chrome was creating over 17 THOUSAND connections in a very short space of time, and somehow, these weren’t being closed properly (whether by the browser or proxy, I’m still not sure). I replicated this behaviour on a user’s desktop with two perfectly legitimate sites in two tabs. netstat -an output on the user’s PC was not pretty… a scrolling mass of connections either established or in TIME_WAIT. Netflow suggests that almost all of these connections are never used to actually transfer any data so it must be something to do with the network behaviour prediction of Chrome.

Anyways, we banned Chrome from user desktops and now the issue has gone away. I also discovered that an older version of Opera on another user’s desktop had the same problem.

I hope this helps someone else suffering the same weird issues. It’s not funny when your public IP PAT port pool for browsing gets exhausted during business hours thanks to some rogue browser going mental.

Quickly getting a Nokia IP appliance restored

How to quickly restore the base IPSO config (not firewall policies).

Config will previously have been backed up ( /config/active ) to a management server as myfirewall-active.txt.

1) scp the active.txt to the Nokia IP Appliance

testbox# scp myfirewall-active.txt admin@myappliance:/config/active.txt

2) rename the current config

myappliance# cd /config
myappliance# mv active active.old
myappliance# mv active.txt active

3) Reload the device

All base configuration is now restored (interfaces, static routes etc etc). Now you can establish SIC and push the policy.