Followup...

On 5/12/23 08:17, Stuart Henderson wrote:
On 2023-05-12, Nick Holland <n...@holland-consulting.net> wrote:
...
I had several other people suggest network problems.  I'm not going to
say "impossible" or even "unlikely", but my understanding is that the
two machines are both plugged into the same switch, in the same rack.


I've since had someone more familiar with the physical environment say
my blind trust in their switch hw may be slightly misplaced. :)

You can also look at

netstat -ni -I ixl0
netstat -ni -I ixl0 -e
kstat ixl0:::


These looked REALLY clean.  no drops, fails or collisions.

which may give some other clues

even pfctl -si might have something relevant

Several people pointed out I was using the default advskew of 1 second,
which means a small network glitch (or system load?  maybe I'm all wrong
about this system never breaking a sweat, at least when it comes to
network traffic) would flip it, so I've increased it to 10 on both
machines (and apparently just induced a flip of my own. oops).  By the
nature of this system, some people will be annoyed by any flip, so it
really doesn't matter if it was a 1 second outage or a 30 second outage,
I just want the system available again after an unhappy event (or
routine maintenance).

the course adjustment in seconds is advbase, advskew is a much smaller
delay meant for a config with primary/backup where the backup advertises
just slightly less frequently.

Um. yeah.  I set advbase, and typed advskew in the e-mail. my bad.
After setting to 10, I have gone over two weeks without any flips, so that
looks like that is a pretty good fix.
Thanks for the guidance!

Nick.

Reply via email to