Hi Pieter,

On 1/11/2019 12:41 AM, Pieter Noordhuis wrote:
> I'm looking into an issue with mlx5 on 4.11.3. It is triggered by high memory 
> pressure but continues for long after the memory pressure is gone. It starts 
> to continuously use pfmemalloc pages, some of which appear to be coming from 
> an RX queue's page cache.
> 
> Attached is a log file showing a second-by-second diff of ethtool counters 
> for a single RX queue that was showing this behavior. This log doesn't 
> capture the start of these drops, because the ethtool monitoring is only 
> started until after the first drops are detected. Every increase of the 
> “cache_waive” counter means mlx5 refused to add a page to its page cache 
> because it was a pfmemalloc page. It also means the corresponding packet gets 
> dropped in sk_filter_trim_cap.
> 
> Initially, the log shows the “cache_busy” counter increasing, meaning that 
> the first page in the page cache has >1 references, so can't be used.

Right, it's a head-of-the-queue blocking. So pages are allocated instead.

> Then after roughly a minute, it switches to increasing the “cache_reuse” and 
> “cache_waive” counters. This means that the pages are coming from the RX 
> queue's page cache *and** *are not put back because they are pfmemalloc pages.

This means the head-of-queue is released, pages are popped from queue 
but fails to get re-pushed, due to the mlx5e_page_is_reserved() check. 
So the cache eventually gets empty.

>This is highly suspicious, as they shouldn't end up in the page cache in the 
>first place. Then, after reusing 255 pages from the page cache, the 
>“cache_empty” counter starts to increase, in lock step with the “cache_waive” 
>counter. This means that the pages are allocated with dev_alloc_pages and not 
>placed in the page cache, because they are pfmemalloc pages. This is also 
>suspicious, because with the memory pressure gone, dev_alloc_pages shouldn't 
>be returning pfmemalloc pages.

Notice that the mlx5e_page_is_reserved() combines two conditions:
[1] page_is_pfmemalloc(page)
[2] page_to_nid(page) != numa_mem_id();

In your case [2] could hold. Can you repro and check that?

> By the time it stops incrementing “cache_waive”, a total of 3804 pages were 
> waived (and packets were dropped), over a duration of 1895 seconds.
> 
> What I would expect to happen is the “cache_reuse” and “cache_waive” to never 
> be incremented in lock step, as pfmemalloc pages must never be added to the 
> RX queue page cache to begin with. Similarly, I would expect “cache_empty” 
> and “cache_waive” to never be incremented in lock step if there is no memory 
> pressure.
> 
> Static analysis of mlx5 on 4.11.3 has so far not lead to any insights as to 
> why this is happening. Any help in this investigation is much appreciated. If 
> there is any additional information I can provide please me know.

Please try to identify the specific reason of mlx5e_page_is_reserved(), 
you might need to hook/modify the driver.
If we see that [2] holds, then it would explain the behavior.

> 
> Pieter
> 

Regards,
Tariq

Reply via email to