Hi, Thorsten here, the Linux kernel's regression tracker. On 11.01.24 09:26, Jaroslav Pulchart wrote: >> On 1/8/2024 2:49 AM, Jaroslav Pulchart wrote: >> First, thank you for your work trying to chase this! >>> I would like to report a regression triggered by recent change in >>> Intel ICE Ethernet driver in the 6.6.9 linux kernel. The problem was >>> bisected and the regression is triggered by >>> fc4d6d136d42fab207b3ce20a8ebfd61a13f931f "ice: alter feature support >>> check for SRIOV and LAG" commit and originally reported as part of >>> https://lore.kernel.org/linux-mm/cak8ffz4dy+gtba40pm7nn5xchy+51w3sfxpqkqpqaksxyyx...@mail.gmail.com/T/#m5217c62beb03b3bc75d7dd4b1d9bab64a3e68826 >>> thread. >> >> I think that's a bad bisect. There is no reason I could understand for >> that change to cause a continuous or large leak, it really doesn't make >> any sense. Reverting it consistently helps? You're not just rewinding >> the tree back to that point, right? just running 6.6.9 without that >> patch? (sorry for being pedantic, just trying to be certain) > > Reverting just the single bisected commit continuously helps for >= > 6.6.9 and as well for current 6.7. > We cannot use any new linux kernel without reverting it due to this > extra memory utilization.
Quick query: what's the status wrt to this regression? Looks like nothing happened in the past week. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke >>>> However, after the following patch we see that more NUMA nodes have >>>> such a low amount of memory and that is causing constant reclaiming >>>> of memory because it looks like something inside of the kernel ate all >>>> the memory. This is right after the start of the system as well. >>> >>> I'm reporting it here as it is a different problem than the original >>> thread. The commit introduces a low memory problem per each numa node >>> of the first socket (node0 .. node3 in our case) and cause constant >>> kswapd* 100% CPU usage. See attached 6.6.9-kswapd_usage.png. The low >>> memory issue is nicely visible in "numastat -m", see attached files: >>> * numastat_m-6.6.10_28GB_HP_ice_revert.txt >= 6.6.9 with reverted ice >>> commit >>> * numastat_m-6.6.10_28GB_HP_no_revert.txt >= 6.6.9 vanilla >>> the server "is fresh" (after reboot), without running any application load. >> >> OK, so the initial allocations of your system is running your system out >> of memory. >> >> Are you running jumbo frames on your ethernet interfaces? >> > > Yes, we are (MTU 9000). > >> Do you have /proc/slabinfo output from working/non-working boot? >> > > Yes, I have a complete sos report so I can pick-up files from there. > See attached > slabinfo.vanila (non-working) > slabinfo.reverted (working) > >>> >>> $ grep MemFree numastat_m-6.6.10_28GB_HP_ice_revert.txt >>> numastat_m-6.6.10_28GB_HP_no_revert.txt >>> numastat_m-6.6.10_28GB_HP_ice_revert.txt:MemFree >>> 2756.89 2754.86 100.39 2278.43 < ice >>> fix is reverted, we have ~2GB free per numa, except one, like before >>> == no issue >>> numastat_m-6.6.10_28GB_HP_ice_revert.txt:MemFree >>> 3551.29 1530.52 2212.04 3488.09 >>> ... >>> numastat_m-6.6.10_28GB_HP_no_revert.txt:MemFree >>> 127.52 66.49 120.23 263.47 < >> >> >>> ice fix is present, we see just few MB free per each node, this will >>> cause kswapd utilization! >>> numastat_m-6.6.10_28GB_HP_no_revert.txt:MemFree >>> 3322.18 3134.47 195.55 879.17 >>> ... >>> >>> If you have some hints on how to debug what is actually occupying all >>> that memory and some fix of the problem will be nice. We can provide >>> testing and more reports if needed to analyze the issue. We reverted >>> the commit fc4d6d136d42fab207b3ce20a8ebfd61a13f931f as a workaround >>> till we know a proper fix. >> >> My first suspicion is that we're contributing to the problem by running >> out of receive descriptors memory. >> >> Can we see the ethtool -S stats from the freshly booted system that's >> running out of memory or doing OOM? Also, all the standard debugging >> info (at least once please), devlink dev info, any other configuration >> specifics? What networking config (bonding? anything else?) >> > > The system is not in OOM, it starts to continuously utilize four > kswapd0-4 of each numa node from the first CPU socket processes (each > at 100% and all doing swap in/out) after the system start to be used > by application due to "low memory". > > We have two 25G 2P E810-XXV Adapters. The first port of each (em1 + > p3p1) is connected and they're bonded in LACP. Second ports (em2 and > p3p2) are unused. > > See attached file for working: > ethtool_-S_em1.reverted > ethtool_-S_em2.reverted > ethtool_-S_p3p1.reverted > ethtool_-S_p3p2.reverted > > See attached file for non-working: > ethtool_-S_em1.vanila > ethtool_-S_em2.vanila > ethtool_-S_p3p1.vanila > ethtool_-S_p3p2.vanila > > >> Do you have a bugzilla.kernel.org bug yet where you can upload larger >> files like dmesg and others? > > I do not have yet, I will create a new one and ping you then. > >> >> Also, I'm curious if your problem goes away if you change / reduce the >> number of queues per port. use ethtool -L eth0 combined 4 ? >> > > I will try and give you feedback soon. > >> You also said something about reproducing when launching / destroying >> virtual machines with VF passthrough? > > The memory usage is there from boot without running any VMs. The issue > is that the host has low memory for self and it starts to use kswapd > when we start to use it by starting vms. > >> >> Can you reproduce the issue without starting qemu (just doing bare-metal >> SR-IOV instance creation/destruction via >> /sys/class/net/eth0/device/sriov_numvfs ?) >> > > Yes we can reproduce it without qemu running, the extra memory usage > is from the beginning after boot, not depending on any running VM. > > We do not use SR-IOV. > >> Thanks > > Thanks, > Jaroslav Pulchart