Hi Ole,

Ole Holm Nielsen <[email protected]> writes:

> As a small contribution to the Slurm community, I've moved my collection of
> Slurm tools to GitHub at https://github.com/OleHolmNielsen/Slurm_tools.  These
> are tools which I feel makes the daily cluster monitoring and management a
> little easier.
>
> The following Slurm tools are available:
>
> * pestat Prints a Slurm cluster nodes status with 1 line per node and job 
> info.
>
> * slurmreportmonth Generate monthly accounting statistics from Slurm using the
> sreport command.
>
> * showuserjobs Print the current node status and batch jobs status broken down
> into userids.
>
> * slurmibtopology Infiniband topology tool for Slurm.
>
> * Slurm triggers scripts.
>
> * Scripts for managing nodes.
>
> * Scripts for managing jobs.
>
> The tools "pestat" and "slurmibtopology" have previously been announced to 
> this
> list, but future updates will be on GitHub only.
>
> I would also like to mention our Slurm deployment HowTo guide at
> https://wiki.fysik.dtu.dk/niflheim/SLURM
>
> /Ole

Thanks for sharing your tools.  Here are some brief comments

- psjob/psnode
  - The USERLIST variable makes the commands a bit brittle, since ps
    will fail if you pass an unknown username.
- showuserjobs
  - Doesn't handle usernames longer than 8-chars (we have longer names)
  - The grouping doesn't seem quite correct.  As shown in the example
    below, not all the users of the group appear under the group total
    for the appropriate group:
  
    Username    Jobs  CPUs   Jobs  CPUs  Group     Further info
    ========    ==== =====   ==== =====  ========  =============================
    GRAND_TOTAL  168  1089     55   451  ALL       running+idle=1540 CPUs 29 
users
    GROUP_TOTAL   56   349     10   119  group01   running+idle=468 CPUs 8 users
    user01        27   324      4    52  group02   One, User
    GROUP_TOTAL   27   324      4    52  group02   running+idle=376 CPUs 1 users
    user02        29   174      1     6  group01   Two, User
    GROUP_TOTAL    5   148     18   208  group03   running+idle=356 CPUs 4 users
    user03         3   120     16   176  group03   Three, User
    user04        11    96      3    48  group01   Four, User
    ...
    
In general, maybe it would good to have a common config file, where things such 
as
paths to binaries, USERLIST and username lengths are defined.

Cheers,

Loris

-- 
Dr. Loris Bennett (Mr.)
ZEDAT, Freie Universität Berlin         Email [email protected]

Reply via email to