Ok, sadly adding the !sleep 5 is not helping and made it even worse :(
E.g. the reboot of the primary with the sleep resulted in this;
.
.
softraid0 at root
scsibus2 at softraid0: 256 targets
root on sd0a (8985ec86e22625f3.a) swap on sd0b dump on sd0b
carp0: state transition: INIT -> BACKUP
carp0: state transition: BACKUP -> INIT
carp0: state transition: INIT -> BACKUP
carp0: state transition: BACKUP -> MASTER
arp_rtrequest: bad gateway value
carp1: state transition: INIT -> BACKUP
carp: pfsync0 demoted group carp by 32 to 160 (pfsync init)
carp: pfsync0 demoted group pfsync by 32 to 32 (pfsync init)
carp: pfsync0 demoted group carp by 1 to 161 (pfsync bulk start)
carp: pfsync0 demoted group pfsync by 1 to 33 (pfsync bulk start)
carp1: state transition: BACKUP -> MASTER
carp1: state transition: MASTER -> BACKUP
carp0: state transition: MASTER -> BACKUP
carp1: state transition: BACKUP -> MASTER
carp0: state transition: BACKUP -> MASTER
arp_rtrequest: bad gateway value
carp: pfsync0 demoted group carp by -1 to 160 (pfsync bulk done)
carp: pfsync0 demoted group pfsync by -1 to 32 (pfsync bulk done)
carp: pfsync0 demoted group carp by -32 to 128 (pfsync init)
carp: pfsync0 demoted group pfsync by -32 to 0 (pfsync init)
carp1: state transition: MASTER -> BACKUP
carp0: state transition: MASTER -> BACKUP
carp1: state transition: BACKUP -> MASTER
carp0: state transition: BACKUP -> MASTER
arp_rtrequest: bad gateway value
carp1: state transition: MASTER -> BACKUP
carp0: state transition: MASTER -> BACKUP
carp1: state transition: BACKUP -> MASTER
carp0: state transition: BACKUP -> MASTER
arp_rtrequest: bad gateway value
carp1: state transition: MASTER -> BACKUP
carp0: state transition: MASTER -> BACKUP
carp1: state transition: BACKUP -> MASTER
carp0: state transition: BACKUP -> MASTER
arp_rtrequest: bad gateway value
.
All now settled and stable..
NB; the 'arp_rtrequest: bad gateway' value is completely legitimate and
is because the 'inet alias ' entries have a /32 mask which is correct
to ensure the routing table ends up with
80.149.158.165 localhost UGHS 0 0 33152 8 lo0
80.149.158.165/32 80.149.158.165 U 0 0 - 4
carp0
Apparently this error (arp_rtrequest: bad gateway value) is yet to be
removed by the devs for this condition where it is correct.
I just really need our firewalls to be able to reboot without the one
that was rebooted taking master but while keeping low advbase and
advskew values?
And to also improve the issue when a NIC cable is reconnected
(currently pulling an NIC causes fail-over (great), plugging it back in
results in double master and thus it takes over! :(
A CARP INIT pause seems like the obvious solution..
Thanks for your thoughts :)
Andy.
On Thu 18 Jul 2013 12:34:11 BST, Andy wrote:
Hi,
Others have discussed our problem but I cannot see that this has been
implement (I cannot find a man page referring to this).
http://openbsd.7691.n7.nabble.com/carp-init-delay-td226187.html
I.e. When a firewall boots up, the connected switch port starts STP and
is initially blocked, causing the newly booting firewall to think it is
master, the port then starts forwarding and I have double master.
This causes issues with other daemons too which monitor the CARP state
like sasynd, BGPD etc...
I have enabled port fast where I can. However I cannot guarantee this
and the WAN connections to our data centre network do not want to enable
port past. This means I have to set a high advbase, but this ruins the
response time.
I could add "!sleep 5" to the top of carp interfaces as suggested in the
link above but this really belongs in the kernel as this only helps with
the firewall reboot condition and not all the other possible network
state changes etc like the removal of a NIC and reconnection (which
restarts STP etc).
Has this been done? :)