Is it possible these applications are using mmap() to do the IO?  I'm not sure 
if mmap is (or can) be effectively tracked at the user/kernel interface (which 
is what llite stats are showing).

You _might_ be able to see the page faults in the "vmstat 1" output?

I'm of course happy to be proven wrong by adding stats counters for this (eg. 
count page faults, etc).

Cheers, Andreas

On Nov 19, 2024, at 07:55, Martin, Philipp <pm.mar...@itc.rwth-aachen.de> wrote:


You don't often get email from pm.mar...@itc.rwth-aachen.de. Learn why this is 
important<https://aka.ms/LearnAboutSenderIdentification>

Hi all,


We are having an issue where the statistics file in `.../lustre/llite/*/stats` 
does not show the read or write bytes for some traffic.

File opens & closes are being recorded and the read/write activity is shown in 
the `osc/*/stats` files as expected, but it would be more convenient to see the 
aggregated results rather than having to sum up the data for every storage 
target.


What would cause traffic to not be shown under llite? Can certain I/O bypass 
the LLITE subsystem?

For reference, we have noticed this specifically for machine learning tools 
using PyTorch and nVidia DALI.


I'd be grateful for any hints!

Philipp

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to