Hi,

I have a puzzling issue with carp which I wondered whether anyone knew
the answer to. I have two carp + pf + pfsync (on openbsd 4.2) boxes in
a standard failover configuration (master and backup designated by
advskew values). When the master is brought down the failover works
nicely. When the master comes back up though, it takes control
straight away, but doesn't respond to anything for between 5 and 20
seconds. I have found a workaround for this issue by enabling portfast
on the port switches that the firewall is connected to, but it doesn't
make any sense to me why the firewall acts in this way when portfast
is disabled.

Looking at the sequence of events when the master comes up I have:

1. Network interface comes up.
2. Switch port cycles through listening, learning, and finally moves
to forwarding state.
3. At the precise moment that the port enters the forwarding state
packets come to and from the firewall, so it wins the election and
becomes the master again as one would expect.
4. HOWEVER, although the master now originates and receives traffic,
it doesn't respond to any traffic, ie it won't send an echo reply to a
request or ack any tcp traffic.This stays like this for between 5 and
20 seconds,
5. 5-20seconds later, the machine starts responding to messages.

If I turn off portfast on the switch ports, the sequence is exactly
the same, except that the 5 to 20 second delay isn't there.

I have looked at the pf logs, and pf seems to have initialised
correctly and is passing in the echo requests, but I don't see
anything after this. So I have ruled out pf from my investigation.

I wondered if anyone had come across anything similar in the past, or
whether anyone has any advice on what to try to track down the issue?
Although I can fix it by turning off portfast (which is easy to do)
I'd like to understand why it is doing this to better understand the
system as a whole, so if anyone has any hints I'd really appreciate
hearing them.
Thanks!

Reply via email to