One other way is to install xltop(https://github.com/jhammond/xltop) and use xltop client (ncurses based linux top like tool) to watch for top client with more requests per sec (xltop -k q h). You can also use it to track jobs but you might have to write your own nodes to job mapping script (xltop-clusterd).
On Fri, May 28, 2021 at 4:21 PM Mohr, Rick via lustre-discuss <[email protected]> wrote: > > Bill, > > One option I have used in the past is to look at the rpc request history. > For example, on an oss server, you can run: > > lctl get_param ost.OSS.ost_io.req_history > > and then extract the client nid for each request. Based on that, you can > calculate the number of requests coming into the server and look for any > clients that are significantly higher than the others. Maybe something like: > > lctl get_param ost.OSS.ost_io.req_history | cut -d: -f3 | sort | uniq -c | > sort -n > > I have used that approach in the past to identify misbehaving clients (the > number of requests from such clients was usually one or two orders of > magnitude higher than the others). If multiple clients are unusually high, > you may be able to correlate the nodes with currently running jobs to > identify a particular job (assuming you don't already have lustre job stats > enabled). > > -Rick > > > On 5/4/21, 2:41 PM, "lustre-discuss on behalf of Bill Anderson via > lustre-discuss" <[email protected] on behalf of > [email protected]> wrote: > > > Hi All, > > Can you recommend good ways to identify Lustre client hosts that might > be causing stability or performance problems for the entire filesystem? > > For example, if a user is inadvertently doing something that's > creating an RPC storm, what are good ways to identify the client host that > has triggered the storm? > > Thank you! > > Bill > > _______________________________________________ > lustre-discuss mailing list > [email protected] > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
