On Mon, Oct 28, 2024 at 5:07 PM Vitaliy Makkoveev <o...@bsdbox.dev> wrote:
> > On 29 Oct 2024, at 00:55, Stuart Henderson <s...@spacehopper.org> wrote: > > > > On 2024/10/29 00:11, Vitaliy Makkoveev wrote: > >>> - Set up vio(4) interfaces with -tcplro and enable wg(4) interfaces as > >>> usual (In my case, I'm routing IPv4 and IPv6 traffic through wg(4) > >>> tunnel). > >> > >> So, problem lays in vio(4). > > > > Or perhaps that is a trigger but the problem is elsewhere. > > Look. wg(4) is suspicious by itself, so it will be very useful to > determine where the problem lays, within wg(4) or not. > > em(4) has no LRO support, but it had [1] and may be still has some > problems with TSO, so I’m not surprised that wg(4) crashes trying to > alloc mbuf(9) on hosts with em(4) too. > > 1. > http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/dev/pci/if_em.c.diff?r1=1.373&r2=1.374 > > > > > bentley@ saw similar wg(4) problems on a machine with just em(4): > > > > em0 at pci0 dev 31 function 6 "Intel I219-V" rev 0x21: msi, address > 94:c6:91:a3:6d:8a > > > > Now, if I looked at the right files, only ix(4) vm(4) vio(4) have LRO - > > em(4) does not. > > > > We do support TSO on some em(4). Though, while I am not certain, I don't > > think we do on I219-V... Anthony, do you still have that machine > available? > > Can you do an "ifconfig em hwfeatures" please so we can be sure? > > > > My best guess from the information I have (I don't think it's possible > > to map from a dmesg attach line to a mac_type without more information - > > the pci id isn't printed) is that it's an em_pch_spt which doesn't do > > TSO... > > I had unpredictable crashes on an octeon edgerouter erlite that was deployed to a client, we had no incidents for months until I deployed wg, unfortunately I hadn't serial available for that machine, that project ended up cancelled and removed the machine from there. What I recall was that it didn't present during testing with other edgerouter i used during wg testing, so maybe it's something traffic related. After decommissioning I haven't touched the machine, I still have it available. I logged most headers of traffic there, may be able to extract some info from that... will try to gather more info, interesting to see this problem is bigger than octeon