On 1/8/2024 2:49 AM, Jaroslav Pulchart wrote: > Hello First, thank you for your work trying to chase this!
> > I would like to report a regression triggered by recent change in > Intel ICE Ethernet driver in the 6.6.9 linux kernel. The problem was > bisected and the regression is triggered by > fc4d6d136d42fab207b3ce20a8ebfd61a13f931f "ice: alter feature support > check for SRIOV and LAG" commit and originally reported as part of > https://lore.kernel.org/linux-mm/cak8ffz4dy+gtba40pm7nn5xchy+51w3sfxpqkqpqaksxyyx...@mail.gmail.com/T/#m5217c62beb03b3bc75d7dd4b1d9bab64a3e68826 > thread. I think that's a bad bisect. There is no reason I could understand for that change to cause a continuous or large leak, it really doesn't make any sense. Reverting it consistently helps? You're not just rewinding the tree back to that point, right? just running 6.6.9 without that patch? (sorry for being pedantic, just trying to be certain) >> However, after the following patch we see that more NUMA nodes have >> such a low amount of memory and that is causing constant reclaiming >> of memory because it looks like something inside of the kernel ate all >> the memory. This is right after the start of the system as well. > > I'm reporting it here as it is a different problem than the original > thread. The commit introduces a low memory problem per each numa node > of the first socket (node0 .. node3 in our case) and cause constant > kswapd* 100% CPU usage. See attached 6.6.9-kswapd_usage.png. The low > memory issue is nicely visible in "numastat -m", see attached files: > * numastat_m-6.6.10_28GB_HP_ice_revert.txt >= 6.6.9 with reverted ice commit > * numastat_m-6.6.10_28GB_HP_no_revert.txt >= 6.6.9 vanilla > the server "is fresh" (after reboot), without running any application load. OK, so the initial allocations of your system is running your system out of memory. Are you running jumbo frames on your ethernet interfaces? Do you have /proc/slabinfo output from working/non-working boot? > > $ grep MemFree numastat_m-6.6.10_28GB_HP_ice_revert.txt > numastat_m-6.6.10_28GB_HP_no_revert.txt > numastat_m-6.6.10_28GB_HP_ice_revert.txt:MemFree > 2756.89 2754.86 100.39 2278.43 < ice > fix is reverted, we have ~2GB free per numa, except one, like before > == no issue > numastat_m-6.6.10_28GB_HP_ice_revert.txt:MemFree > 3551.29 1530.52 2212.04 3488.09 > ... > numastat_m-6.6.10_28GB_HP_no_revert.txt:MemFree > 127.52 66.49 120.23 263.47 < > ice fix is present, we see just few MB free per each node, this will > cause kswapd utilization! > numastat_m-6.6.10_28GB_HP_no_revert.txt:MemFree > 3322.18 3134.47 195.55 879.17 > ... > > If you have some hints on how to debug what is actually occupying all > that memory and some fix of the problem will be nice. We can provide > testing and more reports if needed to analyze the issue. We reverted > the commit fc4d6d136d42fab207b3ce20a8ebfd61a13f931f as a workaround > till we know a proper fix. My first suspicion is that we're contributing to the problem by running out of receive descriptors memory. Can we see the ethtool -S stats from the freshly booted system that's running out of memory or doing OOM? Also, all the standard debugging info (at least once please), devlink dev info, any other configuration specifics? What networking config (bonding? anything else?) Do you have a bugzilla.kernel.org bug yet where you can upload larger files like dmesg and others? Also, I'm curious if your problem goes away if you change / reduce the number of queues per port. use ethtool -L eth0 combined 4 ? You also said something about reproducing when launching / destroying virtual machines with VF passthrough? Can you reproduce the issue without starting qemu (just doing bare-metal SR-IOV instance creation/destruction via /sys/class/net/eth0/device/sriov_numvfs ?) Thanks