> On Mon, Apr 14, 2025 at 06:29:01PM +0200, Jaroslav Pulchart wrote: > > Hello, > > > > While investigating increased memory usage after upgrading our > > host/hypervisor servers from Linux kernel 6.12.y to 6.13.y, I observed > > a regression in available memory per NUMA node. Our servers allocate > > 60GB of each NUMA node’s 64GB of RAM to HugePages for VMs, leaving 4GB > > for the host OS. > > > > After the upgrade, we noticed approximately 500MB less free RAM on > > NUMA nodes 0 and 2 compared to 6.12.y, even with no VMs running (just > > the host OS after reboot). These nodes host Intel 810-XXV NICs. Here's > > a snapshot of the NUMA stats on vanilla 6.13.y: > > > > NUMA nodes: 0 1 2 3 4 5 6 7 8 > > 9 10 11 12 13 14 15 > > HPFreeGiB: 60 60 60 60 60 60 60 60 60 > > 60 60 60 60 60 60 60 > > MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453 > > 65470 65470 65470 65470 65470 65470 65470 65462 > > MemFree: 2793 3559 3150 3438 3616 3722 3520 3547 3547 > > 3536 3506 3452 3440 3489 3607 3729 > > > > We traced the issue to commit 492a044508ad13a490a24c66f311339bf891cb5f > > "ice: Add support for persistent NAPI config". > > > > We limit the number of channels on the NICs to match local NUMA cores > > or less if unused interface (from ridiculous 96 default), for example: > > ethtool -L em1 combined 6 # active port; from 96 > > ethtool -L p3p2 combined 2 # unused port; from 96 > > > > This typically aligns memory use with local CPUs and keeps NUMA-local > > memory usage within expected limits. However, starting with kernel > > 6.13.y and this commit, the high memory usage by the ICE driver > > persists regardless of reduced channel configuration. > > > > Reverting the commit restores expected memory availability on nodes 0 > > and 2. Below are stats from 6.13.y with the commit reverted: > > NUMA nodes: 0 1 2 3 4 5 6 7 8 > > 9 10 11 12 13 14 15 > > HPFreeGiB: 60 60 60 60 60 60 60 60 60 > > 60 60 60 60 60 60 60 > > MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453 65470 > > 65470 65470 65470 65470 65470 65470 65462 > > MemFree: 3208 3765 3668 3507 3811 3727 3812 3546 3676 3596 > > ... > > > > This brings nodes 0 and 2 back to ~3.5GB free RAM, similar to kernel > > 6.12.y, and avoids swap pressure and memory exhaustion when running > > services and VMs. > > > > I also do not see any practical benefit in persisting the channel > > memory allocation. After a fresh server reboot, channels are not > > explicitly configured, and the system will not automatically resize > > them back to a higher count unless manually set again. Therefore, > > retaining the previous memory footprint appears unnecessary and > > potentially harmful in memory-constrained environments > > > > Best regards, > > Jaroslav Pulchart > > > > > Hello Jaroslav, > > I have just sent a series for converting the Rx path of the ice driver > to use the Page Pool. > We suspect it may help for the memory consumption issue since it removes > the problematic code and delegates some memory management to the generic > code. > > Could you please give it a try and check if it helps for your issue. > The link to the series: > https://lore.kernel.org/intel-wired-lan/[email protected]/
I can try it, however I cannot apply the patch as-is @ 6.15.y: $ git am ~/ice-convert-Rx-path-to-Page-Pool.patch Applying: ice: remove legacy Rx and construct SKB Applying: ice: drop page splitting and recycling error: patch failed: drivers/net/ethernet/intel/ice/ice_txrx.h:480 error: drivers/net/ethernet/intel/ice/ice_txrx.h: patch does not apply Patch failed at 0002 ice: drop page splitting and recycling hint: Use 'git am --show-current-patch=diff' to see the failed patch hint: When you have resolved this problem, run "git am --continue". hint: If you prefer to skip this patch, run "git am --skip" instead. hint: To restore the original branch and stop patching, run "git am --abort". hint: Disable this message with "git config set advice.mergeConflict false" > > Thanks, > Michal >
