Re: [lustre-discuss] good ways to identify clients causing problems?

Bill Anderson via lustre-discuss Tue, 04 May 2021 12:48:16 -0700

    Thank you so much!



On Tue, May 4, 2021 at 1:31 PM Andreas Dilger <[email protected]> wrote:

> On May 4, 2021, at 12:41, Bill Anderson via lustre-discuss <
> [email protected]> wrote:
>
>
>    Hi All,
>
>    Can you recommend good ways to identify Lustre client hosts that might
> be causing stability or performance problems for the entire filesystem?
>
>    For example, if a user is inadvertently doing something that's creating
> an RPC storm, what are good ways to identify the client host that has
> triggered the storm?
>
>
> If you have a JobID enabled on the clients (which can be done even if they
> are not batch scheduled, like "procname_uid" for login nodes), then you can
> watch "lctl get_param *.*.job_stats | grep -v ' 0, unit:'" (to filter out
> unused stats) to see if there are *jobs* which put a high RPC load on that
> server.
>
> If you are looking for a particular *client* you can look at "lctl
> get_param *.*.exports.*.stats" to see if any are driving a lot of RPCs,
> possibly after clearing those stats with "lctl set_param
> *.*.exports.*.stats=0".
>
> If you feel inclined, it would be quite useful to add a mode to the
> "llstat" utility to be able to read and aggregate stats from e.g. all the
> "exports.*.stats" files and show the top users by NID and RPC count.  I
> think several people have made scripts to this effect (you might even find
> some on Github), but nobody has ever submitted it to be included into the
> repo for everyone to use.  There are more elaborate monitoring systems
> (e.g. IML, lltop, Graphana that need agents installed, central monitoring,
> etc.), but having a simple "check load on the local node like 'top'" tool
> would still be helpful.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Lustre Architect
> Whamcloud
>
>
>
>
>
>
>

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] good ways to identify clients causing problems?

Reply via email to