Checkpoint Firewall high interrupt CPU%

When this issue occurred, top was showing that the large majority of CPU was of interrupt category despite low traffic levels. Failing over to the secondary member of the cluster did not fix the problem; the fault moved. This issue can be reproduced on Nokia IP Appliances running IPSO and newer Checkpoint platforms running Gaia.

last pid: 59653;  load averages:  0.05,  0.07,  0.02   up 571+16:11:35 12:19:30
45 processes:  1 running, 44 sleeping
CPU states:  1.8% user,  0.0% nice,  1.8% system, 86.1% interrupt, 10.4% idle
Mem: 248M Active, 1321M Inact, 218M Wired, 72M Cache, 99M Buf, 143M Free
Swap: 4096M Total, 4096M Free

ps -aux was showing high CPU time consumed by [swi1: net_taskq0].

cpfw[admin]# ps -aux
root    14 98.2  0.0     0    16  ??  RL   10Feb12 65517:46.72 [swi1: net_taskq0]

Running netstat -ni showed errors incrementing on a few interfaces. At first this seemed like a hardware issue so failover to secondary was initiated. The problem moved to the other firewall.

After more digging, the culprit was found to be some new traffic streams of low bandwidth, but extremely high packet rate (in this case, some UDP syslog forwarding to a host beyond the firewall). A layer 3 switch at the source end was also having some issues so some of the traffic patterns may have been anomalous, compounding the issue.

This traffic was not permissioned on the firewall so was being matched by the drop rule. It seems that having a large rule base makes this issue even worse as traffic at a rate of thousands of packets per second is consuming a lot of CPU cycles. It was noted that adding a rule to permission the traffic near the top of the rule base dropped CPU usage significantly.

It makes sense to assume that as these streams are hitting the drop rule very frequently, rapid evaluations of the entire rulebase are taking place. The handling of “flows” for UDP traffic is probably more limited than is implied in IPSO/Gaia documentation.

It is worth enabling monitoring and finding this sort of traffic to allow you to create or move appropriate rules near the top of the rulebase to avoid unnecessary extra processing, especially if your rulebase is in the order of hundreds of rules.

I suppose you could conclude that you could quite easily DoS a policy-heavy checkpoint firewall by throwing a rapid stream of UDP packets to a far-side destination that doesn’t match anywhere in the rulebase. Note that this issue was encountered on an internal firewall where IPS was NOT enabled. IPS may mitigate this problem.