[slurm-users] Re: Run only one time on a node

2025-02-18 Thread Shunran Zhang via slurm-users
Assuming all node need to run the same task once... How about -n num_of_nodes --ntasks-per-node=1 ? Otherwise if it is more deployment related I would use ansible to do that. S. Zhang On 2025/02/19 2:37, John Hearns via slurm-users wrote: I am running single node tests on a cluster. I can se

[slurm-users] Re: Can Not Use A Single GPU for Multiple Jobs

2024-06-20 Thread Shunran Zhang via slurm-users
Arnuld, You may be looking for the srun parameter or configuration option of "--oversubscribe" for CPU as that is the limiting factor now. S. Zhang On 2024/06/21 2:48, Brian Andrus via slurm-users wrote: Well, if I am reading this right, it makes sense. Every job will need at least 1 core j

[slurm-users] Re: Building Slurm debian package vs building from source

2024-05-22 Thread Shunran Zhang via slurm-users
Hi Arnuld, What I would probably do is to build one for each distro and install them either directly into /usr/local or using deb package. The DEBIAN/control is used by apt to manage a couple of things, such as indexing so apt search shows what this package is for, which package it could rep

[slurm-users] Re: Building Slurm debian package vs building from source

2024-05-22 Thread Shunran Zhang via slurm-users
Hi Arnuld It is most important to keep the Slurm version the same across the board. As you are mentioning the "deb" package I am assuming all of your nodes are of a debian-based distribution that should be close enough for each other. However, Debian based distros are not as "binary compatible" a

Re: [slurm-users] Autodetect of nvml is not working in

2023-11-30 Thread Shunran Zhang
Hi Ravi Unfortunately if the NVML flag is off on compile time ( when the maintainer build the apt package for you to install ), that part of code would not be in your binary code. Recompile yourself following the official documentation or find some repository that builds slurm with NVML are

Re: [slurm-users] Autodetect of nvml is not working in gres.conf

2023-11-30 Thread Shunran Zhang
Hi all, Apologies for writing something misleading in the last mail. I missed your error message. Rob was correct - your slurmd appears not to have the NVML flag on compile time. You need to set up the NVML and turn the --with-nvml flag on when configuring slurm to fix the issue if you are compil

Re: [slurm-users] Autodetect of nvml is not working in gres.conf

2023-11-30 Thread Shunran Zhang
Hi all, If you could offer a little bit more details on your OS and Slurm version that might shed some light. There is an interesting detail about the NVML package if you are using RHEL-like OS. The NVML detection part of the slurm library (/usr/lib64/slurm/gpu_nvml.so) is linked against the /lib

Re: [slurm-users] Submitting jobs from machines outside the cluster

2023-08-27 Thread Shunran Zhang
Hi Steve, The requirement for a client node as I tested is * munge daemon for auth * mechanism for client to obtain configuration So yes I believe you would need munge working on the submitting machine. For the configuration, I used to keep a copy of the slurm config in /etc/slurm in the cli

Re: [slurm-users] Custom Gres for SSD

2023-07-24 Thread Shunran Zhang
  xfs_quota -x -c "limit -p bsoft=0m bhard=0m ${SLURM_JOBID}" ${local_dir}   # remove the folder   if [[ -d ${SLURM_TMPDIR} ]]; then     rm -rf --one-file-system ${SLURM_TMPDIR}   fi   exit 0 In order to use project quota you would need to activate it by using this mount flag: pquota in

[slurm-users] Custom Gres for SSD

2023-07-23 Thread Shunran Zhang
Hi all, I am attempting to setup a gres to manage jobs that need a scratch space, but only a few of our computational nodes are equipped with SSD for such scratch space. Originally I setup a new partition for those IO-bound jobs, but it ended up that those jobs might be allocated to the same node

Re: [slurm-users] [EXT] --mem is not limiting the job's memory

2023-06-23 Thread Shunran Zhang
Hi Would you mind to check your job scheduling settings in slurm.conf ? Namely *SelectTypeParameters = **CR_CPU_Memory *or the like. Also, you may want to use systemd-cgtop to at least confirm jobs are indeed running in cgroups. Sincerely, S. Zhang On Fri, Jun 23, 2023, 12:07 Boris Yazlovitsky

Re: [slurm-users] sbatch does not work with Debian image

2023-03-14 Thread Shunran Zhang
The error message says that slurm cannot find slurm config file. Do you have a local copy of /etc/slurm/* or share the /etc/slurm across NFS, or using DNS & configless slurm? Sincerely, S. Zhang > Sorin Draga 於2023/03/14 18:49寫道: > >  > Hello everyone, > > I'm trying to run the new Debian i