Re: CARP on Switch ports without port fast leading to double master-master problems

Andy Mon, 22 Jul 2013 04:14:57 -0700

I messed up and added '!sleep 5' to the hostname.carp instead of thephysical interface..

None the less I'm surprised that no one else has any thoughts on thiswhen it has been discussed several times before.

It would be /very/ easy to resolve (by someone with talent andexperience of the code base ;) and would aid the stability of OpenBSDgreatly (in an operational sense), as the knock on effects this hasreally affect sasyncd, openbgpd and openospfd to name to the ones Ihave problems with when a cable is pulled/NIC reset etc.


E.g. https://groups.google.com/forum/#!topic/fa.openbsd.tech/NXy2rivB_z0

I cannot imagine their is a technical challenge beyond adding asleep(x) at INIT state with 'x' taken from a sysctl value (saying thisas someone who doesn't know the code base at all). - One day when Ihave more time I will check it out..

If you happen to have the code base nearby I would really appreciate somuch if you could throw a sleep in after CARP moves to INIT.


Thanks everyone,
Andy.


On Thu 18 Jul 2013 13:04:01 BST, Andy wrote:

Ok, sadly adding the !sleep 5 is not helping and made it even worse :(

E.g. the reboot of the primary with the sleep resulted in this;
.
.
softraid0 at root
scsibus2 at softraid0: 256 targets
root on sd0a (8985ec86e22625f3.a) swap on sd0b dump on sd0b
carp0: state transition: INIT -> BACKUP
carp0: state transition: BACKUP -> INIT
carp0: state transition: INIT -> BACKUP
carp0: state transition: BACKUP -> MASTER
arp_rtrequest: bad gateway value
carp1: state transition: INIT -> BACKUP
carp: pfsync0 demoted group carp by 32 to 160 (pfsync init)
carp: pfsync0 demoted group pfsync by 32 to 32 (pfsync init)
carp: pfsync0 demoted group carp by 1 to 161 (pfsync bulk start)
carp: pfsync0 demoted group pfsync by 1 to 33 (pfsync bulk start)
carp1: state transition: BACKUP -> MASTER
carp1: state transition: MASTER -> BACKUP
carp0: state transition: MASTER -> BACKUP
carp1: state transition: BACKUP -> MASTER
carp0: state transition: BACKUP -> MASTER
arp_rtrequest: bad gateway value
carp: pfsync0 demoted group carp by -1 to 160 (pfsync bulk done)
carp: pfsync0 demoted group pfsync by -1 to 32 (pfsync bulk done)
carp: pfsync0 demoted group carp by -32 to 128 (pfsync init)
carp: pfsync0 demoted group pfsync by -32 to 0 (pfsync init)
carp1: state transition: MASTER -> BACKUP
carp0: state transition: MASTER -> BACKUP
carp1: state transition: BACKUP -> MASTER
carp0: state transition: BACKUP -> MASTER
arp_rtrequest: bad gateway value
carp1: state transition: MASTER -> BACKUP
carp0: state transition: MASTER -> BACKUP
carp1: state transition: BACKUP -> MASTER
carp0: state transition: BACKUP -> MASTER
arp_rtrequest: bad gateway value
carp1: state transition: MASTER -> BACKUP
carp0: state transition: MASTER -> BACKUP
carp1: state transition: BACKUP -> MASTER
carp0: state transition: BACKUP -> MASTER
arp_rtrequest: bad gateway value
.
All now settled and stable..


NB; the 'arp_rtrequest: bad gateway' value is completely legitimate
and is because the 'inet alias ' entries have a /32 mask which is
correct to ensure the routing table ends up with

80.149.158.165  localhost          UGHS       0        0 33152     8 lo0
80.149.158.165/32  80.149.158.165 U          0        0     -     4 carp0

Apparently this error (arp_rtrequest: bad gateway value) is yet to be
removed by the devs for this condition where it is correct.


I just really need our firewalls to be able to reboot without the one
that was rebooted taking master but while keeping low advbase and
advskew values?

And to also improve the issue when a NIC cable is reconnected
(currently pulling an NIC causes fail-over (great), plugging it back
in results in double master and thus it takes over! :(

A CARP INIT pause seems like the obvious solution..


Thanks for your thoughts :)
Andy.


On Thu 18 Jul 2013 12:34:11 BST, Andy wrote:

Hi,

Others have discussed our problem but I cannot see that this has been
implement (I cannot find a man page referring to this).
http://openbsd.7691.n7.nabble.com/carp-init-delay-td226187.html

I.e. When a firewall boots up, the connected switch port starts STP and
is initially blocked, causing the newly booting firewall to think it is
master, the port then starts forwarding and I have double master.

This causes issues with other daemons too which monitor the CARP state
like sasynd, BGPD etc...

I have enabled port fast where I can. However I cannot guarantee this
and the WAN connections to our data centre network do not want to enable
port past. This means I have to set a high advbase, but this ruins the
response time.

I could add "!sleep 5" to the top of carp interfaces as suggested in the
link above but this really belongs in the kernel as this only helps with
the firewall reboot condition and not all the other possible network
state changes etc like the removal of a NIC and reconnection (which
restarts STP etc).

Has this been done? :)

Re: CARP on Switch ports without port fast leading to double master-master problems

Reply via email to