I have a case where a large (60+), rapid flurry of incoming GARP messages are evidently being only partially processed. Especially, the first 25 GARP messages are applied to the ARP cache and the remainder are ignored. I'm looking for a better understanding of why this happens as well as suggestions on how to workaround or solve the problem.
Background... We have a double-firewalled arrangement. The inner firewall is a Checkpoint ClusterXL pair of Resilience Ndurant modules running Checkpoint NG AI R55 HFA_17, and the outer firewall is a pair of Soekris 4801 boxes running OpenBSD 3.7 in a PF/PFsync/CARP pair. Between the two pairs of firewall is our DMZ, behind the inner firewall is our trusted net. The ClusterXL pair runs as active/standby (as does the OpenBSD pair). Nominally, this all works fine. Now, the ClusterXL pair assigns a single virtual IP address to the outer interface of the active cluster member. The active cluster member provides proxy ARP for all NAT'd hosts. Whenever the active member sends an ARP reply for its virtual outer IP address or one of the NAT's hosts, it uses the MAC address for the physical ethernet interface. Whenever a cluster failover occurs, the new active member assigns the outer virtual IP address to itself, then sends GARP messages out the outer interface for the virtual IP address as well as for the NAT address of each NAT'd host. Since we have about 61 NAT'd hosts, that's 62 GARP messages that get sent. This set of GARP messages is sent fairly quickly as a flurry, less than 7 ms for all 61 messages. We find that when this flurry of GARP messages arrives at the outer firewall (observed using "tcp -i sis2 -ne 'arp'"), that the relevant ARP cache entries for the first through the 25th GARP messages are indeed updated with the new MAC address. ARP entries for any GARP message after the 25th are not updated - these entries retain their stale MAC value until their ARP entry expires several minutes later. However, if we manually send a lone GARP message (using the 'garp' command) from the inner firewall for one of these stale ARP entries, then it -is- updated. So the only affected ARP cache entries are those for whom a GARP message is preceded by at least 25 GARP messages within the last few milliseconds, and a lone GARP message (not part of a 'flurry') works fine. This sounds an awful lot like some sort of congestion. We found that other machines (mainly FreeBSD) that are peers alongside the outer firewall's inner interface on the DMZ don't suffer from this symptom. But then, these machines are not firewalls. Now, I realize that incoming ARP messages are handled on BSD systems with a network interrupt (NETISR) especially dedicated to ARP, separate from the NETISR for IP traffic. And I -believe- that the receive queue for the ARP NETISR is not too big (room for 50 mbufs, I recall). Also, since an OpenBSD firewall uses PF for filtering, and PF does rule processing in interrupt mode, then a busy OpenBSD firewall can typically be seen to be computing almost entirely in interrupt mode when not idle. Indeed, our active OpenBSD firewall is often 30% to 50% idle with pretty much all other computing in interrupt mode. I wonder if this behavior can starve the ARP NETISR somewhat. I know that NETISR receive queues have a drop counter associated with them (to count drops of an mbuf when the receive queue is full), but as far as I know there's actually no way to inspect the ARP receive queue drop count - the count is maintained by the kernel but there is no tool that displays it (e.g. netstat doesn't show this). I've checked through /var/log/pflog and I see no evidence that any ARP messages were dropped by PF. Could it be that the large amount of time a firewall dwells in interrupt mode for PF processing can somehow cause the ARP receive queue to get full more easily than otherwise? Why is '25' the magic number for seemingly dropped GARP messages instead of '50'? This is 100% reproducible at exactly 25. Is there any way that anyone can think of that I can inspect the ARP NETISR drop count? Any ideas on how to workaround or fix this? I don't see any sysctl settings in 3.7 for the ARP receive queue size. And there appears to be no way to throttle or pace the Checkpoint ClusterXL GARP messages that I can find. Bill -- William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value | Eldorado Computing 5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 | Fax: +11-602-604-3115| http://www.eldocomp.com -- CONFIDENTIALITY NOTICE -- Information transmitted by this e-mail is proprietary to MphasiS and/or its Customers and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient or it appears that this mail has been forwarded to you without proper authority, you are notified that any use or dissemination of this information in any manner is strictly prohibited. In such cases, please notify us immediately at [EMAIL PROTECTED] and delete this mail from your records.