Re: 4.7 ospfd FIB/RIB synchronization

David Gwynne Wed, 20 Apr 2011 08:08:06 -0700

On 20/04/2011, at 11:08 PM, Jonathan Lassoff wrote:

> On Wed, Apr 20, 2011 at 4:22 AM, David Gwynne <l...@animata.net> wrote:
>> you might be able to upgrade your passive firewall to 4.9 next to the
active 4.7 one. it looks like the protocol stayed the same so they should be
able to talk to each other.
>
> This would seem to be the case.
>
> This (http://undeadly.org/cgi?action=article&sid=20090301211402) is an
> absolutely excellent bit of writing about the improvements to pfsync,
> BTW. Thanks for letting that be shared.
>
>> however, it looks like bulk updates were broken in 4.7, which would explain
your failover problems. you can work around that by going "pfctl -S
/dev/stdout | ssh activefw pfctl -L /dev/stdin" as root on the passive fw.
>
> As an initial seeding of state? It seems to me that only some of my
> flows get affected when failing over (not everything is reset and
> traffic can still flow).


yes. the pfctl commands will do a bulk update since the in kernel
implementation was unreliable back then.

> It appears that both firewalls have an approximately congruent set of
> states, but usually a "pfctl -ss | wc -l" can be off by several
> hundred, to several thousand states at times. My hunch is that state
> creation and counter updates are not updated synchronously, so when
> failing over there are still some updates in-flight, and for flows
> that are moving their sequence numbers at a decent clip I could see
> why they might get reset.

pf has a bit of fuzz when it does its tcp window matching, so packets can get
ahead of the firewall and be ok. also, pf will drop out of window packets
rather than send RSTs and such. pfsync will also make a good effort to merge
state updates with local changes and will aggressively send updates to its
peers when it thinks traffic has recently gone over both legs of a firewall.

however, if the bulk update didnt work properly then you can have some missing
after failover. if the state doesnt exist then you fall through to the
ruleset, pfsync doesnt ask its peers for missing states. this used to affect
me with very long lived connections that could be idle for a while (eg, nfs).

> Have you ever used pfsync with the "defer" option set? I can imagine
> that it just takes longer for sessions to start since each firewall
> would have to wait for the insertion of the state on the other
> firewall, but I wonder how much latency that adds in practice.

i wrote defer, so yes...

on my boxes the increase in latency is about .2 to .3ms. if a firewall is
missing its peer(s) it will go up to about 1/100th of a second.

> Another open question would be what to do in the case of multiple
> firewalls receiving the multicast update (not applicable for me, but
> something I'm considering trying). I wonder if there ought to be a
> hook for defer to count the number of related received state insertion
> messages it gets before starting.

the code assumes that if one peer got and acked the update, then all your
peers got the update.

>> as a matter of interest, are you using ospf for failover on one side of
your firewalls?
>
> I'm hooking CARP interfaces up into ospfd to signal to my IGP which
> firewall is active at a given time. ospfd seems to have hooks into
> CARP which will change LSA metrics based on the CARP state.
>
> For the interfaces that these firewalls are announcing into the IGP,
> CARP is used to direct upstream traffic at the active router.

thats exactly how i have my stuff configured.

dlg

Re: 4.7 ospfd FIB/RIB synchronization

Reply via email to