On 19.02.24 12:40, Jaroslav Pulchart wrote: > If the question is for me then my opinion about it is this: > > I'm fine with the behaviour of a driver about memory consumption if it > is predictable/described with the possibility to change it from > default values. My understanding of "predictable" is something like > this: > > The ICE driver is going to > * Setup 64 queues per each port, not active included. > * Each queue consumes "xxxx MB" amount of kernel memory per each numa node. > example: Two 2 ports Intel NICs using ICE driver will consume ~6GB of > RAM of each NUMA node, please consider changing the defaults values to > avoid OOM :-).
6GB of RAM for each NUMA node? That sounds, well, a whole lot :-D Makes me also wonder a bit why nobody else reported this (and if you have any debug option enable in your .config or something like that that which is rarely used). Whatever: to me it still feels like this regression is not handled as Linus would want it, but I'm not totally sure and guess I have to admit that I'm out of my depth here. I'll let my regression tracking bot continue monitor this, but will most likely leave things to the network and driver maintainers from here on unless something changes. Ciao, Thorsten > po 19. 2. 2024 v 12:29 odesÃlatel Thorsten Leemhuis > <regressi...@leemhuis.info> napsal: >> >> On 01.02.24 18:19, Jaroslav Pulchart wrote: >>>> >>>> On Wed, 24 Jan 2024 15:29:38 +0100 Linux regression tracking (Thorsten >>>> Leemhuis) wrote: >>>>>>> I think that's a bad bisect. There is no reason I could understand for >>>>>>> that change to cause a continuous or large leak, it really doesn't make >>>>>>> any sense. Reverting it consistently helps? You're not just rewinding >>>>>>> the tree back to that point, right? just running 6.6.9 without that >>>>>>> patch? (sorry for being pedantic, just trying to be certain) >>>>>> >>>>>> Reverting just the single bisected commit continuously helps for >= >>>>>> 6.6.9 and as well for current 6.7. >>>>>> We cannot use any new linux kernel without reverting it due to this >>>>>> extra memory utilization. >>>>> >>>>> Quick query: what's the status wrt to this regression? Looks like >>>>> nothing happened in the past week. >>>> >>>> Is someone working on this? Indeed the commit in question looks >>>> harmless but can't argue with the revert helping :S >>> >>> No clue if someone is working on it, >> >> Yeah, a quick public status update would be really helpful. And maybe >> some debugging tips that might enable Jaroslav to pinpoint the real >> problem. >> >>> however the commit itself is a >>> trigger of some other issue. >>> >>> The analysis of my colleague Igor (see previous email) shows the >>> memory consumption is caused by queues of each ice network interface >>> (even the unused ones). Our final fix was to lower the queues to 6 for >>> used interfaces and 2 of unused interfaces manually. >> >> Despite the above allow me to ask: Can you live with that workaround? >> Ideally of course this should be fixed, but well, the world sometimes is >> a tricky place. :-/ >> >> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) >> -- >> Everything you wanna know about Linux kernel regression tracking: >> https://linux-regtracking.leemhuis.info/about/#tldr >> If I did something stupid, please tell me, as explained on that page. >> >> #regzbot poke > >