I have a case where a large (60+), rapid flurry of incoming GARP messages are
evidently being only partially processed.  Especially, the first 25 GARP
messages are applied to the ARP cache and the remainder are ignored.  I'm
looking for a better understanding of why this happens as well as suggestions on
how to workaround or solve the problem.

Background...

We have a double-firewalled arrangement.  The inner firewall is a Checkpoint
ClusterXL pair of Resilience Ndurant modules running Checkpoint NG AI R55
HFA_17, and the outer firewall is a pair of Soekris 4801 boxes running OpenBSD
3.7 in a PF/PFsync/CARP pair.  Between the two pairs of firewall is our DMZ,
behind the inner firewall is our trusted net.  The ClusterXL pair runs as
active/standby (as does the OpenBSD pair).

Nominally, this all works fine.

Now, the ClusterXL pair assigns a single virtual IP address to the outer
interface of the active cluster member.  The active cluster member provides
proxy ARP for all NAT'd hosts.  Whenever the active member sends an ARP reply
for its virtual outer IP address or one of the NAT's hosts, it uses the MAC
address for the physical ethernet interface.  Whenever a cluster failover
occurs, the new active member assigns the outer virtual IP address to itself,
then sends GARP messages out the outer interface for the virtual IP address as
well as for the NAT address of each NAT'd host.  Since we have about 61 NAT'd
hosts, that's 62 GARP messages that get sent.  This set of GARP messages is sent
fairly quickly as a flurry, less than 7 ms for all 61 messages.

We find that when this flurry of GARP messages arrives at the outer firewall
(observed using "tcp -i sis2 -ne 'arp'"), that the relevant ARP cache entries
for the first through the 25th GARP messages are indeed updated with the new MAC
address.  ARP entries for any GARP message after the 25th are not updated -
these entries retain their stale MAC value until their ARP entry expires several
minutes later.

However, if we manually send a lone GARP message (using the 'garp' command) from
the inner firewall for one of these stale ARP entries, then it -is- updated.

So the only affected ARP cache entries are those for whom a GARP message is
preceded by at least 25 GARP messages within the last few milliseconds, and a
lone GARP message (not part of a 'flurry') works fine.

This sounds an awful lot like some sort of congestion.

We found that other machines (mainly FreeBSD) that are peers alongside the outer
firewall's inner interface on the DMZ don't suffer from this symptom.  But then,
these machines are not firewalls.

Now, I realize that incoming ARP messages are handled on BSD systems with a
network interrupt (NETISR) especially dedicated to ARP, separate from the NETISR
for IP traffic.  And I -believe- that the receive queue for the ARP NETISR is
not too big (room for 50 mbufs, I recall).  Also, since an OpenBSD firewall uses
PF for filtering, and PF does rule processing in interrupt mode, then a busy
OpenBSD firewall can typically be seen to be computing almost entirely in
interrupt mode when not idle.  Indeed, our active OpenBSD firewall is often 30%
to 50% idle with pretty much all other computing in interrupt mode.  I wonder if
this behavior can starve the ARP NETISR somewhat.

I know that NETISR receive queues have a drop counter associated with them (to
count drops of an mbuf when the receive queue is full), but as far as I know
there's actually no way to inspect the ARP receive queue drop count - the count
is maintained by the kernel but there is no tool that displays it (e.g. netstat
doesn't show this).

I've checked through /var/log/pflog and I see no evidence that any ARP messages
were dropped by PF.

Could it be that the large amount of time a firewall dwells in interrupt mode
for PF processing can somehow cause the ARP receive queue to get full more
easily than otherwise?  Why is '25' the magic number for seemingly dropped GARP
messages instead of '50'?  This is 100% reproducible at exactly 25.  Is there
any way that anyone can think of that I can inspect the ARP NETISR drop count?
Any ideas on how to workaround or fix this?  I don't see any sysctl settings in
3.7 for the ARP receive queue size.  And there appears to be no way to throttle
or pace the Checkpoint ClusterXL GARP messages that I can find.


Bill
-- 
William Bloom| Snr Systems Engineer|M P H A S I S Architecting Value | Eldorado
Computing
5353 North 16th Street, Suite 400 Phoenix, Az 85016 | Direct: +11-602-604-3100 |
Fax: +11-602-604-3115| http://www.eldocomp.com

-- CONFIDENTIALITY NOTICE --

Information transmitted by this e-mail is proprietary to MphasiS and/or its 
Customers and is intended for use only by the individual or entity to which it 
is addressed, and may contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient or it appears that this mail has been forwarded to you without proper 
authority, you are notified that any use or dissemination of this information 
in any manner is strictly prohibited. In such cases, please notify us 
immediately at [EMAIL PROTECTED] and delete this mail from your records.

Reply via email to