On 2021-03-22, Szél Gábor <gabor.s...@wantax.eu> wrote: > Dear List! > > We make some tests, i think this is intel em driver (82571EB) bug! > > * if i move aggr0 from em devices to bnx devices, everything will be fine! > (only change trunkport from em to bnx) > * if i change intel network card to other intel network card with > 82571EB chipset, not working. > * if i copy network interfaces config to another server (clear openbsd > 6.8 install) with 6x Intel I210 network cards, everything will be fine! > * if i move SSD from working intel configuration server (I210) to > PE210 (82571EB), not working. > * i tested with oBSD 6.7, the problem exists ., but before reinstall > this server, on oBSD 6.1, LACP + 82571EB is working correctly. > > we have many-many OpenBSD (router, firewall) installations, but we have > not yet experienced this problem. If possible, we use intel network cards. >
First off, it would be helpful to provide ifconfig output (preferably in full) and lacp status from the switch. This might give some clues immediately.. One big difference between trunk and aggr is that trunk always uses promiscuous mode, aggr doesn't (unless other software forces it). This means that it relies on filters on the NIC (e.g. received address filters / multicast filters) getting programmed correctly. If this is caused by a NIC driver bug (or hardware bug not worked- around properly by the driver) it's most likely in that area. The NIC is normally (when not in promisc mode) programmed to receive just packets sent to certain addresses. There's a small table (receive address register RAR, 15 entries on this nic) where you put a list of full destination MAC addresses to receive. If there are a bunch of multicast addresses in use that exceeds this there's another filter table used for these. Or it can be set to "multicast promiscuous" where it uses the filter for normal traffic but allows all multicast. I'm not really a driver hacker but have an idea for one thing you can try if nobody has a better idea. In if_em.c find this 1438 reg_rctl = E1000_READ_REG(&sc->hw, RCTL); 1439 reg_rctl &= ~(E1000_RCTL_MPE | E1000_RCTL_UPE); 1440 ifp->if_flags &= ~IFF_ALLMULTI; 1441 1442 if (ifp->if_flags & IFF_PROMISC || ac->ac_multirangecnt > 0 || 1443 ac->ac_multicnt > MAX_NUM_MULTICAST_ADDRESSES) { 1444 ifp->if_flags |= IFF_ALLMULTI; 1445 reg_rctl |= E1000_RCTL_MPE; 1446 if (ifp->if_flags & IFF_PROMISC) 1447 reg_rctl |= E1000_RCTL_UPE; 1448 } else { change 1442 to this 1442 if ( 1 || ifp->if_flags & IFF_PROMISC || ac->ac_multirangecnt > 0 || and build/install a new kernel. This will stop it using the multicast filter and instead accept all multicast packets. If this fixes things then the problem is likely to be with the multicast filter programming, it varies a bit between models and might have missed some special case. If not then, well, at least it rules that out .. Another big difference is that trunk uses the MAC address from a member interface, aggr creates a random address by default (you can force it to a particular address with "lladdr aa:bb:cc:dd:ee:ff"). There's clearly some problem in this area with 82571 em worked around in if_em_hw.c (around line 7547). (locally administered address = where the MAC has been reset). Perhaps it's not handled completely. Of course if you need it working there is also the option to use trunk instead of aggr, or to run something (e.g. tcpdump) to force the nic into promiscuous mode, but it would be nice to get this figured out..