--On Monday, October 04, 2010 12:11:01 PM +0000 Stuart Henderson <s...@spacehopper.org> wrote:
> On 2010-10-03, Devin Reade <g...@gno.org> wrote: > <snip *excellent* write-up of the problem and network layout; > if only all problem reports were this good!> Thanks. I'm also a developer, just not in the OpenBSD kernel. > Until you can move to a dedicated nic, I would > suggest switching to using syncpeer in pfsync config, and ipsec [snip] I forgot to include /etc/hostname.pfsync0, but it is using syncpeer on vr0. > So basically there are untrusted machines on the interface on which you > also run pfsync. That depends on your definition of untrusted. vr0 being the DMZ, all machines there are under my control and I'm pretty confident that there's nothing malicious happening. It is true, though, that there is traffic other than pfsync on that segment. Are you suspecting that other traffic (and in particular avahi-daemon) is interfering with pfsync? The dual-port NICs arrived, so I can put pfsync on its own interface now and see if that affects the situation. One other recent datapoint: In following Kenneth's suggestion of breaking into the kernel, I disabled the watchdog and set ddb.panic=1 ddb.console=1 Since then I have had time to trigger only one failure so far (again, no panic, no automatic drop to ddb), but in that case when I did a 'continue' in ddb, the failed machine returned to operation. So it looks like the hang may not have been a permanent hang, but just long enough to (previously) trigger the watchdog which had a timeout 32 seconds. But that's still inconclusive. (I have nothing else useful to add yet re ddb.) Devin