mostly, our problem was, that we forgot to add/remove a node to/from the
partitions/topology file, which caused slurmctld to deny startup. So I wrote a
simple checker for that. Here is the output of a sample run:
reading '../conf/rcc/slurm.conf' ...
reading '../conf/rcc/nodes.conf' ...
reading
Hi Matt,
How about this sinfo command:
$ sinfo -O NodeList:30,Features:30,StateLong
NODELIST AVAIL_FEATURESSTATE
i023 xeon2650v2,infiniband,xeon16 draining@
i[004-022,024-050]xeon2650v2,infiniband,xeon16 allocated
Hi Matt,
you may have a look to sinfo/squeue command with the --format / -o
output options, e.g.:
[root@ma1 slurm]# sinfo -t idle -o "%P %.5a %.10l %.6D %.6t %N %b"
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST ACTIVE_FEATURES
compute up 8:00:00 44 idle
m[10474-10475,10594-105
All,
I work on a cluster that uses SLURM which has various types of nodes that are
are controlled via --constraint flags in sbatch.
Now, I started thinking "How can I figure out how many jobs are
running/pending/etc on a certain type of node?". I first thought obviously
"squeue --constraint=fo
Hello
We have four Xeon Phi (KNL) nodes with 64 cores SMT-4 each (256
hyperthreads total). They are configured in different KNL modes
(SNC4/flat, SNC4/cache, All2all/flat and all2all/cache). The node that
is in SNC4/Flat won't let us allocate all 256 hyperthreads. Half the
cores only get 2 hy