On Mon, Apr 6, 2026 at 10:04 PM Marc Harvey <[email protected]> wrote: > > On Mon, Apr 6, 2026 at 7:44 AM Jakub Kicinski <[email protected]> wrote: > > > > On Mon, 06 Apr 2026 03:03:36 +0000 Marc Harvey wrote: > > > Allow independent control over receive and transmit enablement states > > > for aggregated ports in the team driver. > > > > > > The motivation is that IEE 802.3ad LACP "independent control" can't > > > be implemented for the team driver currently. This was added to the > > > bonding driver in commit 240fd405528b ("bonding: Add independent > > > control state machine"). > > > > > > This series also has a few patches that add tests to show that the old > > > coupled enablement still works and that the new decoupled enablement > > > works as intended (4, 5, and 10). > > > > > > There are three patches with small fixes as well, with the goal of > > > making the final decoupling patch clearer (1, 2, and 3). > > > > activebackup: > > > > TAP version 13 > > 1..1 > > # overriding timeout to 2400 > > # selftests: drivers/net/team: teamd_activebackup.sh > > # Setting up two-link aggregation for runner activebackup > > # Teamd version is: teamd 1.32 > > # Conf files are /tmp/tmp.ydjNK9Um7H and /tmp/tmp.xZuc3cWbN0 > > # This program is not intended to be run as root. > > # This program is not intended to be run as root. > > # Created team devices > > # Teamd PIDs are 21457 and 21461 > > # exec of "ip link set eth0 up" failed: No such file or directory > > # exec of "ip link set eth0 up" failed: No such file or directory > > # exec of "ip link set eth1 up" failed: No such file or directory > > # exec of "ip link set eth1 up" failed: No such file or directory > > # PING fd00::2 (fd00::2) 56 data bytes > > # 64 bytes from fd00::2: icmp_seq=1 ttl=64 time=0.753 ms > > # > > # --- fd00::2 ping statistics --- > > # 1 packets transmitted, 1 received, 0% packet loss, time 0ms > > # rtt min/avg/max/mdev = 0.753/0.753/0.753/0.000 msPacket count for > > test_team2 was 0 > > # Waiting for eth0 in ns2-lZ0gqd to stop receiving > > # Packet count for eth0 was 0Packet count for eth0 was 0 > > # Packet count for eth1 was 0 > > # Waiting for eth1 in ns2-lZ0gqd to stop receiving > > # Packet count for eth1 was 0Packet count for eth0 was 0 > > # Packet count for eth1 was 0 > > # TEST: teamd active backup runner test [FAIL] > > # Traffic did not reach team interface in NS2. > > # Tearing down two-link aggregation > > # Failed to kill daemon: Timer expired > > # Failed to kill daemon: Timer expired > > # Sending sigkill to teamd for test_team1 > > # rm: cannot remove '/var/run/teamd/test_team1.pid': No such file or > > directory > > # rm: cannot remove '/var/run/teamd/test_team1.sock': No such file or > > directory > > # Sending sigkill to teamd for test_team2 > > # rm: cannot remove '/var/run/teamd/test_team2.pid': No such file or > > directory > > # rm: cannot remove '/var/run/teamd/test_team2.sock': No such file or > > directory > > not ok 1 selftests: drivers/net/team: teamd_activebackup.sh # exit=1 > > > > > > transmit_failover: > > > > TAP version 13 > > 1..1 > > # overriding timeout to 2400 > > # selftests: drivers/net/team: transmit_failover.sh > > # Error: ipv6: address not found. > > # Setting team in ns2-yxjiUo to mode roundrobin > > # Error: ipv6: address not found. > > # Setting team in ns1-Jht6kA to mode broadcast > > # Packet count for eth0 was 0 > > # Packet count for eth1 was 0 > > # Packet count for eth0 was 0 > > # Packet count for eth1 was 0 > > # Packet count for eth0 was 0 > > # Packet count for eth1 was 0 > > # TEST: Failover of 'broadcast' test [FAIL] > > # eth0 not transmitting when both links enabled > > # Setting team in ns1-Jht6kA to mode roundrobin > > # Packet count for eth0 was 0 > > # Packet count for eth1 was 0 > > # Packet count for eth0 was 0 > > # Packet count for eth1 was 0 > > # Packet count for eth0 was 0 > > # Packet count for eth1 was 0 > > # TEST: Failover of 'roundrobin' test [FAIL] > > # eth0 not transmitting when both links enabled > > # Setting team in ns1-Jht6kA to mode random > > # Packet count for eth0 was 0 > > # Packet count for eth1 was 0 > > # Packet count for eth0 was 0 > > # Packet count for eth1 was 0 > > # Packet count for eth0 was 0 > > # Packet count for eth1 was 0 > > # TEST: Failover of 'random' test [FAIL] > > # eth0 not transmitting when both links enabled > > not ok 1 selftests: drivers/net/team: transmit_failover.sh # exit=1 > > -- > > pw-bot: cr > > Apologies for all of the test failures. Before sending this revision, > I ran each test thousands of times and observed no failures, so I > thought the flakiness would be resolved. > > No matter what I try, I can't recreate either issue on my end. I've > tried building with the exact config from one of the test runs > (https://netdev-ctrl.bots.linux.dev/logs/vmksft/bonding/results/590921/). > I've tried stressing the VM according to > https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style#reproducing-unstable-tests > (this makes the tests time out, but I can still see traffic). I've > tried using the netdev-testing/net-next-2026-04-06--09-00 kernel > source. I've tried in nested and unnested virtual machines. I've also > tried running multiple test instances in parallel, but nothing > recreates the issues. The issues seem related to tcpdump, but without > reproducing them, I can only guess. Any suggestions for running the > tests exactly as the CI does would be greatly appreciated. > > - Marc
Thank you very much to [email protected], who figured out how to recreate the issue on Fedora. Fedora's /etc/services maps TCP port 1234 to the "search-agent" service (normal), which tcpdump then uses to text-replace port numbers in its output. So the tests were looking for ${ip_address}.1234, but tcpdump was spitting out ${ip_address}.search_agent. What is strange is that the test already uses tcpdump's "-n" option: "Don't convert addresses (i.e., host addresses, port numbers, etc.) to names." It turns out that Fedora has a patched version of tcpdump that separates the normal "-n" option into two options! "-n" handles host addresses, and "-nn" handles port and protocol numbers. The tcpdump invocation used by the selftests only uses "-n". What's stranger is that passing "-nn" to tcpdump is actually portable, because under the hood it is treated as a counter, with or without the Fedora patch: https://github.com/the-tcpdump-group/tcpdump/blob/master/tcpdump.c#L1915 (thanks again to Kuniyuki for discovering this). For v6, I will just change the TCP port to one that is not used by a service, and will make the tcpdump helper function in the net/forwarding lib use "-nn" instead of "-n". - Marc

