Hi, I notice a regression report on Bugzilla [1]. Quoting from it:
> Hi, > > After I updated to 6.4 through Archlinux kernel update, suddenly I noticed > random packet losses on my routers like nodes. I have these networking > relevant config on my nodes > > 1. Using archlinux > 2. Network config through systemd-networkd > 3. Using bird2 for BGP routing, but not relevant to this bug. > 4. Using nftables for traffic control, but seems not relevant to this bug. > 5. Not using fail2ban like dymanic filtering tools, at least at L3/L4 level > > After I ruled out systemd-networkd, nftables related issues. I tracked down > issues to kernel. > > Here's the tcpdump I'm seeing on one side of my node "" > > ``` > sudo tcpdump -i fios_wan port 38851 > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode > listening on fios_wan, link-type EN10MB (Ethernet), snapshot length 262144 > bytes > 10:33:06.073236 IP [BOS1_NODE].38851 > [REDACTED_PUBLIC_IPv4_1].38851: UDP, > length 148 > 10:33:11.406607 IP [BOS1_NODE].38851 > [REDACTED_PUBLIC_IPv4_1].38851: UDP, > length 148 > 10:33:16.739969 IP [BOS1_NODE].38851 > [REDACTED_PUBLIC_IPv4_1].38851: UDP, > length 148 > 10:33:21.859856 IP [BOS1_NODE].38851 > [REDACTED_PUBLIC_IPv4_1].38851: UDP, > length 148 > 10:33:27.193176 IP [BOS1_NODE].38851 > [REDACTED_PUBLIC_IPv4_1].38851: UDP, > length 148 > 5 packets captured > 5 packets received by filter > 0 packets dropped by kernel > ``` > > But on the other side "[REDACTED_PUBLIC_IPv4_1]", tcpdump is replying packets > in this wireguard stream. So packet is lost somewhere in the link. > > From the otherside, I can do "mtr" to "[BOS1_NODE]"'s public IP and found the > moment the link got lost is right at "[BOS1_NODE]", that means > "[BOS1_NODE]"'s networking stack completely drop the inbound packets from > specific ip addresses. > > Some more digging > > 1. This situation began after booting in different delays. Sometimes can > trigger after 30 seconds after booting, and sometimes will be after 18 hours > or more. > 2. It can envolve into worse case that when I do "ip neigh show", the ipv4 > ARP table and ipv6 neighbor discovery start to appear as "invalid", meaning > the internet is completely loss. > 3. When this happened to wan facing interface, it seems OK with lan facing > interfaces. WAN interface was using Intel X710-T4L using i40e and lan side > was using virtio > 4. I tried to bisect in between 6.3 and 6.4, and the first bad commit it > reports was "a3efabee5878b8d7b1863debb78cb7129d07a346". But this is not > relevant to networking at all, maybe it's the wrong commit to look at. At the > meantime, because I haven't found a reproducible way of 100% trigger the > issue, it may be the case during bisect some "good" commits are actually bad. > 5. I also tried to look at "dmesg", nothing interesting pop up. But I'll make > it available upon request. > > This is my first bug reports. Sorry for any confusion it may lead to and > thanks for reading. See Bugzilla for the full thread. Thorsten: The reporter had a bad bisect (some bad commits were marked as good instead), hence SoB chain for culprit (unrelated) ipvu commit is in To: list. I also asked the reporter (also in To:) to provide dmesg and request rerunning bisection, but he doesn't currently have a reliable reproducer. Is it the best I can do? Anyway, I'm adding this regression to be tracked in regzbot: #regzbot introduced: a3efabee5878b8 https://bugzilla.kernel.org/show_bug.cgi?id=217678 #regzbot title: packet drop on Intel X710-T4L due to ipvu boot fix Thanks. [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217678 -- An old man doll... just what I always wanted! - Clara