On Tue, Jul 18, 2023 at 07:51:24AM +0700, Bagas Sanjaya wrote: > Hi, > > I notice a regression report on Bugzilla [1]. Quoting from it: > > > Hi, > > > > After I updated to 6.4 through Archlinux kernel update, suddenly I noticed > > random packet losses on my routers like nodes. I have these networking > > relevant config on my nodes > > > > 1. Using archlinux > > 2. Network config through systemd-networkd > > 3. Using bird2 for BGP routing, but not relevant to this bug. > > 4. Using nftables for traffic control, but seems not relevant to this bug. > > 5. Not using fail2ban like dymanic filtering tools, at least at L3/L4 level > > > > After I ruled out systemd-networkd, nftables related issues. I tracked down > > issues to kernel. > > > > Here's the tcpdump I'm seeing on one side of my node "" > > > > ``` > > sudo tcpdump -i fios_wan port 38851 > > tcpdump: verbose output suppressed, use -v[v]... for full protocol decode > > listening on fios_wan, link-type EN10MB (Ethernet), snapshot length 262144 > > bytes > > 10:33:06.073236 IP [BOS1_NODE].38851 > [REDACTED_PUBLIC_IPv4_1].38851: UDP, > > length 148 > > 10:33:11.406607 IP [BOS1_NODE].38851 > [REDACTED_PUBLIC_IPv4_1].38851: UDP, > > length 148 > > 10:33:16.739969 IP [BOS1_NODE].38851 > [REDACTED_PUBLIC_IPv4_1].38851: UDP, > > length 148 > > 10:33:21.859856 IP [BOS1_NODE].38851 > [REDACTED_PUBLIC_IPv4_1].38851: UDP, > > length 148 > > 10:33:27.193176 IP [BOS1_NODE].38851 > [REDACTED_PUBLIC_IPv4_1].38851: UDP, > > length 148 > > 5 packets captured > > 5 packets received by filter > > 0 packets dropped by kernel > > ``` > > > > But on the other side "[REDACTED_PUBLIC_IPv4_1]", tcpdump is replying > > packets in this wireguard stream. So packet is lost somewhere in the link. > > > > From the otherside, I can do "mtr" to "[BOS1_NODE]"'s public IP and found > > the moment the link got lost is right at "[BOS1_NODE]", that means > > "[BOS1_NODE]"'s networking stack completely drop the inbound packets from > > specific ip addresses. > > > > Some more digging > > > > 1. This situation began after booting in different delays. Sometimes can > > trigger after 30 seconds after booting, and sometimes will be after 18 > > hours or more. > > 2. It can envolve into worse case that when I do "ip neigh show", the ipv4 > > ARP table and ipv6 neighbor discovery start to appear as "invalid", meaning > > the internet is completely loss. > > 3. When this happened to wan facing interface, it seems OK with lan facing > > interfaces. WAN interface was using Intel X710-T4L using i40e and lan side > > was using virtio > > 4. I tried to bisect in between 6.3 and 6.4, and the first bad commit it > > reports was "a3efabee5878b8d7b1863debb78cb7129d07a346". But this is not > > relevant to networking at all, maybe it's the wrong commit to look at. At > > the meantime, because I haven't found a reproducible way of 100% trigger > > the issue, it may be the case during bisect some "good" commits are > > actually bad. > > 5. I also tried to look at "dmesg", nothing interesting pop up. But I'll > > make it available upon request. > > > > This is my first bug reports. Sorry for any confusion it may lead to and > > thanks for reading. > > See Bugzilla for the full thread. > > Thorsten: The reporter had a bad bisect (some bad commits were marked as good > instead), hence SoB chain for culprit (unrelated) ipvu commit is in To: > list. I also asked the reporter (also in To:) to provide dmesg and request > rerunning bisection, but he doesn't currently have a reliable reproducer. > Is it the best I can do? > > Anyway, I'm adding this regression to be tracked in regzbot: > > #regzbot introduced: a3efabee5878b8 > https://bugzilla.kernel.org/show_bug.cgi?id=217678 > #regzbot title: packet drop on Intel X710-T4L due to ipvu boot fix >
This time, the bisection points out to v6.4 networking pull, so: #regzbot introduced: 6e98b09da931a0 (also Cc: Linus.) Thanks. -- An old man doll... just what I always wanted! - Clara
signature.asc
Description: PGP signature