Hi,

I notice a regression report on Bugzilla [1]. Quoting from it:

> Hi,
> 
> After I updated to 6.4 through Archlinux kernel update, suddenly I noticed 
> random packet losses on my routers like nodes. I have these networking 
> relevant config on my nodes
> 
> 1. Using archlinux
> 2. Network config through systemd-networkd
> 3. Using bird2 for BGP routing, but not relevant to this bug.
> 4. Using nftables for traffic control, but seems not relevant to this bug. 
> 5. Not using fail2ban like dymanic filtering tools, at least at L3/L4 level
> 
> After I ruled out systemd-networkd, nftables related issues. I tracked down 
> issues to kernel.
> 
> Here's the tcpdump I'm seeing on one side of my node ""
> 
> ```
> sudo tcpdump -i fios_wan port 38851
> tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
> listening on fios_wan, link-type EN10MB (Ethernet), snapshot length 262144 
> bytes
> 10:33:06.073236 IP [BOS1_NODE].38851 > [REDACTED_PUBLIC_IPv4_1].38851: UDP, 
> length 148
> 10:33:11.406607 IP [BOS1_NODE].38851 > [REDACTED_PUBLIC_IPv4_1].38851: UDP, 
> length 148
> 10:33:16.739969 IP [BOS1_NODE].38851 > [REDACTED_PUBLIC_IPv4_1].38851: UDP, 
> length 148
> 10:33:21.859856 IP [BOS1_NODE].38851 > [REDACTED_PUBLIC_IPv4_1].38851: UDP, 
> length 148
> 10:33:27.193176 IP [BOS1_NODE].38851 > [REDACTED_PUBLIC_IPv4_1].38851: UDP, 
> length 148
> 5 packets captured
> 5 packets received by filter
> 0 packets dropped by kernel
> ```
> 
> But on the other side "[REDACTED_PUBLIC_IPv4_1]", tcpdump is replying packets 
> in this wireguard stream. So packet is lost somewhere in the link.
> 
> From the otherside, I can do "mtr" to "[BOS1_NODE]"'s public IP and found the 
> moment the link got lost is right at "[BOS1_NODE]", that means 
> "[BOS1_NODE]"'s networking stack completely drop the inbound packets from 
> specific ip addresses.
> 
> Some more digging
> 
> 1. This situation began after booting in different delays. Sometimes can 
> trigger after 30 seconds after booting, and sometimes will be after 18 hours 
> or more.
> 2. It can envolve into worse case that when I do "ip neigh show", the ipv4 
> ARP table and ipv6 neighbor discovery start to appear as "invalid", meaning 
> the internet is completely loss.
> 3. When this happened to wan facing interface, it seems OK with lan facing 
> interfaces. WAN interface was using Intel X710-T4L using i40e and lan side 
> was using virtio
> 4. I tried to bisect in between 6.3 and 6.4, and the first bad commit it 
> reports was "a3efabee5878b8d7b1863debb78cb7129d07a346". But this is not 
> relevant to networking at all, maybe it's the wrong commit to look at. At the 
> meantime, because I haven't found a reproducible way of 100% trigger the 
> issue, it may be the case during bisect some "good" commits are actually bad. 
> 5. I also tried to look at "dmesg", nothing interesting pop up. But I'll make 
> it available upon request.
> 
> This is my first bug reports. Sorry for any confusion it may lead to and 
> thanks for reading.

See Bugzilla for the full thread.

Thorsten: The reporter had a bad bisect (some bad commits were marked as good
instead), hence SoB chain for culprit (unrelated) ipvu commit is in To:
list. I also asked the reporter (also in To:) to provide dmesg and request
rerunning bisection, but he doesn't currently have a reliable reproducer.
Is it the best I can do?

Anyway, I'm adding this regression to be tracked in regzbot:

#regzbot introduced: a3efabee5878b8 
https://bugzilla.kernel.org/show_bug.cgi?id=217678
#regzbot title: packet drop on Intel X710-T4L due to ipvu boot fix

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217678

-- 
An old man doll... just what I always wanted! - Clara

Reply via email to