[slurm-users] Issue with "hetjob" directive with heterogeneous job submission script

2020-03-04 Thread CB
Hi, I'm running Slurm 19.05.5. I've tried to write a job submission script for a heterogeneous job following the example at https://slurm.schedmd.com/heterogeneous_jobs.html But it failed with the following error message: $ sbatch new.bash sbatch: error: Invalid directive found in batch script:

Re: [slurm-users] Meaning of --cpus-per-task and --mem-per-cpu when SMT processors are used

2020-03-04 Thread Loris Bennett
Hi Marcus, Thanks for the clarification - I'd actually missed the 'SMT' in subject. Marcus Wagner writes: > Hi Loris, > > CPU is the smallest schedulable unit, in case of SMT its threads. Would it be reasonable to say it's *always* threads and with HT you just have twice as many as without? H

Re: [slurm-users] Meaning of --cpus-per-task and --mem-per-cpu when SMT processors are used

2020-03-04 Thread Alexander Grund
> What is your hardware configuration?  Do you have 1 server with 44 processor sockets, and each processor has 4 CPU cores?  Or is it maybe 1 server with 1 or more sockets for a total of 44 CPU cores, and each CPU core is running 4 hyperthreads? 1 server, 2 sockets, 22 cores each, 4 hyperthrea

Re: [slurm-users] Meaning of --cpus-per-task and --mem-per-cpu when SMT processors are used

2020-03-04 Thread William Brown
What Marcus reports is quite correct. It can be confusing, and Slurm uses 'CPU' I think as a non-specific term to mean 'the smallest assignable compute object'. With SMT enabled that is the thread, and with it disabled it is the core. We were told by the company that installed the cluster at m

Re: [slurm-users] salloc not working in configless setup on login machine

2020-03-04 Thread Angelos Ching
Hi Gizo, I noticed SLURM_CONF was set to a broken socket when inside salloc, that's why sinfo was confused. I've found a workaround that if I "unset SLURM_CONF" before sinfo, then sinfo works. Maybe a bug needs to be reported for this. Best regards, Angelos On 3/4/20 2:07 AM, nan...@luis.un

Re: [slurm-users] Meaning of --cpus-per-task and --mem-per-cpu when SMT processors are used

2020-03-04 Thread Marcus Wagner
Hi Loris, CPU is the smallest schedulable unit, in case of SMT its threads. At the moment we have HT disabled on our systems, therefore CPU is equal to the cores for us. But with HT enabled, CPU is double that large (at least form slurm 18.08). Best Marcus On 3/4/20 10:33 AM, Loris Bennett

Re: [slurm-users] Meaning of --cpus-per-task and --mem-per-cpu when SMT processors are used

2020-03-04 Thread Loris Bennett
Hi Alexander, Alexander Grund writes: > Hi, > > we have a Power9 partition with 44 processors having 4 cores each > totaling 176. > > `scontrol show node ` shows "CoresPerSocket=22" and "CPUTot=176" > which confuses me. Especially as `whypending` reports e.g. "172 cores > free: 1" What's 'whype

Re: [slurm-users] Meaning of --cpus-per-task and --mem-per-cpu when SMT processors are used

2020-03-04 Thread Ole Holm Nielsen
On 3/4/20 10:12 AM, Alexander Grund wrote: we have a Power9 partition with 44 processors having 4 cores each totaling 176. What is your hardware configuration? Do you have 1 server with 44 processor sockets, and each processor has 4 CPU cores? Or is it maybe 1 server with 1 or more sockets

[slurm-users] Meaning of --cpus-per-task and --mem-per-cpu when SMT processors are used

2020-03-04 Thread Alexander Grund
Hi, we have a Power9 partition with 44 processors having 4 cores each totaling 176. `scontrol show node ` shows "CoresPerSocket=22" and "CPUTot=176" which confuses me. Especially as `whypending` reports e.g. "172 cores free: 1" So what are "CPUs" and what are "Cores" to SLURM? Why does it