Dear there,
We tested mpi allreduce job in three modes (srun-dtcp
、mpirun-slurm、mpirun-ssh), and we found that the job running time in the
mpirun-ssh mode is shorter than the other modes.
We've set parameters like below:
/usr/lib/systemd/system/slurmd.service:
LimitMEMLOCK=i
Hi,
When I was testing slurm-19.05.3 with openmpi-4.0.1 、pmix-3.1.3rc4 and
ucx-1.6.1(with IB) ,I got a different error unlike Bug
7646(https://bugs.schedmd.com/show_bug.cgi?id=7646).At first , the job like
"srun --mpi=pmix_v3 xxx" could run with "SLURM_PMIX_DIRECT_CONN=true" and
"SLURM_PMIX_DIR
Dear there,
I have two jobs in my cluster, which has 32 cores per compute node. The
first job uses eight nodes and 256 cores, which means it takes up all eight
nodes. The second job uses five nodes and 32 cores, which means only partial
cores of five nodes will be used. Slurm, however, alloc
Dear there,
I have a cluster with 9 nodes(cmbc[1530-1538]) , each node has 2 cpus
and each cpu has 32cores,
but when I submitted a heterogeneous job twice ,the second job terminated
unexpectedly.
This problem has been bothering me all day. Slurm version is 18.08.5 and here
is the job :
ology Unit
University of Cambridge
Cambridge, CB2 0XY
United Kingdom
On 15 Feb 2019, at 10:05, hu...@sugon.com wrote:
Dear there,
How to view the cpu usage of history jobs at each compute node?
However, this command(control show jobs jobid --detail) can only get the cpu
usage of the currently runn
Dear there,
How to view the cpu usage of history jobs at each compute node?
However, this command(control show jobs jobid --detail) can only get the cpu
usage of the currently running job at each compute node :
Appreciatively,
Menglong