On 7/8/2025 5:50 PM, Jacob Keller wrote: > > > On 7/7/2025 3:03 PM, Jacob Keller wrote: >> Bad news: my hypothesis was incorrect. >> >> Good news: I can immediately see the problem if I set MTU to 9K and >> start an iperf3 session and just watch the count of allocations from >> ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell >> if a change is helping. >> >> I ported the stats from i40e for tracking the page allocations, and I >> can see that we're allocating new pages despite not actually performing >> releases. >> >> I don't yet have a good understanding of what causes this, and the logic >> in ice is pretty hard to track... >> >> I'm going to try the page pool patches myself to see if this test bed >> triggers the same problems. Unfortunately I think I need someone else >> with more experience with the hotpath code to help figure out whats >> going wrong here... > > I believe I have isolated this and figured out the issue: With 9K MTU, > sometimes the hardware posts a multi-buffer frame with an extra > descriptor that has a size of 0 bytes with no data in it. When this > happens, our logic for tracking buffers fails to free this buffer. We > then later overwrite the page because we failed to either free or re-use > the page, and our overwriting logic doesn't verify this. > > I will have a fix with a more detailed description posted tomorrow.
@Jaroslav, I've posted a fix which I believe should resolve your issue: https://lore.kernel.org/intel-wired-lan/[email protected]/T/#u I am reasonably confident it should resolve the issue you reported. If possible, it would be appreciated if you could test it and report back to confirm.
OpenPGP_signature.asc
Description: OpenPGP digital signature
