Assuming all node need to run the same task once...
How about -n num_of_nodes --ntasks-per-node=1 ?
Otherwise if it is more deployment related I would use ansible to do that.
S. Zhang
On 2025/02/19 2:37, John Hearns via slurm-users wrote:
I am running single node tests on a cluster.
I can se
Arnuld,
You may be looking for the srun parameter or configuration option of
"--oversubscribe" for CPU as that is the limiting factor now.
S. Zhang
On 2024/06/21 2:48, Brian Andrus via slurm-users wrote:
Well, if I am reading this right, it makes sense.
Every job will need at least 1 core j
Hi Arnuld,
What I would probably do is to build one for each distro and install
them either directly into /usr/local or using deb package.
The DEBIAN/control is used by apt to manage a couple of things, such as
indexing so apt search shows what this package is for, which package it
could rep
Hi Arnuld
It is most important to keep the Slurm version the same across the board.
As you are mentioning the "deb" package I am assuming all of your nodes are
of a debian-based distribution that should be close enough for each other.
However, Debian based distros are not as "binary compatible" a
Hi Ravi
Unfortunately if the NVML flag is off on compile time ( when the maintainer
build the apt package for you to install ), that part of code would not be in
your binary code.
Recompile yourself following the official documentation or find some repository
that builds slurm with NVML are
Hi all,
Apologies for writing something misleading in the last mail. I missed your
error message.
Rob was correct - your slurmd appears not to have the NVML flag on compile
time.
You need to set up the NVML and turn the --with-nvml flag on when
configuring slurm to fix the issue if you are compil
Hi all,
If you could offer a little bit more details on your OS and Slurm version
that might shed some light.
There is an interesting detail about the NVML package if you are using
RHEL-like OS.
The NVML detection part of the slurm library (/usr/lib64/slurm/gpu_nvml.so)
is linked against the /lib
Hi Steve,
The requirement for a client node as I tested is
* munge daemon for auth
* mechanism for client to obtain configuration
So yes I believe you would need munge working on the submitting machine.
For the configuration, I used to keep a copy of the slurm config in
/etc/slurm in the cli
xfs_quota -x -c "limit -p bsoft=0m bhard=0m ${SLURM_JOBID}"
${local_dir}
# remove the folder
if [[ -d ${SLURM_TMPDIR} ]]; then
rm -rf --one-file-system ${SLURM_TMPDIR}
fi
exit 0
In order to use project quota you would need to activate it by using
this mount flag: pquota in
Hi all,
I am attempting to setup a gres to manage jobs that need a
scratch space, but only a few of our computational nodes are
equipped with SSD for such scratch space. Originally I setup a new
partition for those IO-bound jobs, but it ended up that those jobs
might be allocated to the same node
Hi
Would you mind to check your job scheduling settings in slurm.conf ?
Namely *SelectTypeParameters = **CR_CPU_Memory *or the like.
Also, you may want to use systemd-cgtop to at least confirm jobs are indeed
running in cgroups.
Sincerely,
S. Zhang
On Fri, Jun 23, 2023, 12:07 Boris Yazlovitsky
The error message says that slurm cannot find slurm config file. Do you have a
local copy of /etc/slurm/* or share the /etc/slurm across NFS, or using DNS &
configless slurm?
Sincerely,
S. Zhang
> Sorin Draga 於2023/03/14 18:49寫道:
>
>
> Hello everyone,
>
> I'm trying to run the new Debian i
12 matches
Mail list logo