On 2021-03-22, Szél Gábor <gabor.s...@wantax.eu> wrote:
> Dear List!
>
> We make some tests, i think this is intel em driver (82571EB) bug!
>
>   * if i move aggr0 from em devices to bnx devices, everything will be fine!
>     (only change trunkport from em to bnx)
>   * if i change intel network card to other intel network card with
>     82571EB chipset, not working.
>   * if i copy network interfaces config to another server (clear openbsd
>     6.8 install) with 6x Intel I210 network cards, everything will be fine!
>   * if i move SSD from working intel configuration server (I210) to
>     PE210 (82571EB), not working.
>   * i tested with oBSD 6.7, the problem exists ., but before reinstall
>     this server, on oBSD 6.1, LACP + 82571EB is working correctly.
>
> we have many-many OpenBSD (router, firewall) installations, but we have 
> not yet experienced this problem. If possible, we use intel network cards.
>

First off, it would be helpful to provide ifconfig output (preferably
in full) and lacp status from the switch. This might give some clues
immediately..

One big difference between trunk and aggr is that trunk always uses
promiscuous mode, aggr doesn't (unless other software forces it).
This means that it relies on filters on the NIC (e.g. received 
address filters / multicast filters) getting programmed correctly.
If this is caused by a NIC driver bug (or hardware bug not worked-
around properly by the driver) it's most likely in that area.

The NIC is normally (when not in promisc mode) programmed to
receive just packets sent to certain addresses. There's a small
table (receive address register RAR, 15 entries on this nic) where you
put a list of full destination MAC addresses to receive. If there are
a bunch of multicast addresses in use that exceeds this there's
another filter table used for these. Or it can be set to
"multicast promiscuous" where it uses the filter for normal traffic
but allows all multicast.

I'm not really a driver hacker but have an idea for one thing you
can try if nobody has a better idea. In if_em.c find this

1438         reg_rctl = E1000_READ_REG(&sc->hw, RCTL);
1439         reg_rctl &= ~(E1000_RCTL_MPE | E1000_RCTL_UPE);
1440         ifp->if_flags &= ~IFF_ALLMULTI;
1441 
1442         if (ifp->if_flags & IFF_PROMISC || ac->ac_multirangecnt > 0 ||
1443             ac->ac_multicnt > MAX_NUM_MULTICAST_ADDRESSES) {
1444                 ifp->if_flags |= IFF_ALLMULTI;
1445                 reg_rctl |= E1000_RCTL_MPE;
1446                 if (ifp->if_flags & IFF_PROMISC)
1447                         reg_rctl |= E1000_RCTL_UPE;
1448         } else {

change 1442 to this

1442         if ( 1 || ifp->if_flags & IFF_PROMISC || ac->ac_multirangecnt > 0 
||

and build/install a new kernel. This will stop it using the multicast
filter and instead accept all multicast packets. If this fixes things
then the problem is likely to be with the multicast filter programming,
it varies a bit between models and might have missed some special case.
If not then, well, at least it rules that out ..

Another big difference is that trunk uses the MAC address from a member
interface, aggr creates a random address by default (you can force it to
a particular address with "lladdr aa:bb:cc:dd:ee:ff"). There's clearly
some problem in this area with 82571 em worked around in if_em_hw.c
(around line 7547). (locally administered address = where the MAC has
been reset). Perhaps it's not handled completely.

Of course if you need it working there is also the option to use
trunk instead of aggr, or to run something (e.g. tcpdump) to force
the nic into promiscuous mode, but it would be nice to get this
figured out..


Reply via email to