On 12/05/2023 14:43, Nick Holland wrote:
> I had several other people suggest network problems.  I'm not going to
> say "impossible" or even "unlikely", but my understanding is that the
> two machines are both plugged into the same switch, in the same rack.
>
> Several people pointed out I was using the default advskew of 1 second,
> which means a small network glitch (or system load?  maybe I'm all wrong
> about this system never breaking a sweat, at least when it comes to
> network traffic) would flip it, so I've increased it to 10 on both
> machines (and apparently just induced a flip of my own. oops).  By the
> nature of this system, some people will be annoyed by any flip, so it
> really doesn't matter if it was a 1 second outage or a 30 second outage,
> I just want the system available again after an unhappy event (or
> routine maintenance).
>
> Nick.

Usually it's a network problem. The big delay of 3 days you had also suggests 
that.

But on the other hand, I also had a similar problem in one of my load balancers 
(routing/fw/relayd), where the MASTER was becoming BACKUP for no obvious 
reason. I believed it was a network glitch, but couldn't trace it.

The problem after all was that they where pushing the limit of max pf states 
and relayd checks where failing. Not obvious to spot at all. I believe default 
is 20K.

pfctl -sm
pfctl -si

After increasing that limit with set limit states I've never had a glitch any 
more.

G

Reply via email to