subject:"\[slurm\-users\] Single Node cluster. How to manage oversubscribing"

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

2023-03-02 Thread Analabha Roy

On Wed, 1 Mar 2023 at 07:51, Doug Meyer wrote: > Hi, > > I forgot one thing you didn't mention. When you change the node > descriptors and partitions you have to also restart slurmctld. scontrol > reconfigure works for the nodes but the main daemon has to be told to > reread the config. Until

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-28 Thread Doug Meyer

Hi, I forgot one thing you didn't mention. When you change the node descriptors and partitions you have to also restart slurmctld. scontrol reconfigure works for the nodes but the main daemon has to be told to reread the config. Until you restart the daemon it will be referencing the config fro

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-26 Thread Analabha Roy

Hey, Thanks for sticking with this. On Sun, 26 Feb 2023 at 23:43, Doug Meyer wrote: > Hi, > > Suggest removing "boards=1", The docs say to include it but in previous > discussions with schedmd we were advised to remove it. > > I just did. Then ran scontrol reconfigure. > When you are runni

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-26 Thread Doug Meyer

Hi, Suggest removing "boards=1", The docs say to include it but in previous discussions with schedmd we were advised to remove it. When you are running execute "scontrol show node " and look at the lines ConfigTres and AllocTres. The former is what the maitre d believes is available, the latter

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-26 Thread Analabha Roy

Hi Doug, Again, many thanks for your detailed response. Based on my understanding of your previous note, I did the following: I set the nodename with CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16 ThreadsPerCore=2 and the partitions with oversubscribe=force:2 then I put further restrictio

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-25 Thread Doug Meyer

Hi, You got me, I didn't know that " oversubscribe=FORCE:2" is an option. I'll need to explore that. I missed the question about srun. srun is the preferred I believe. I am not associated with drafting the submit scripts but can ask my peer. You do need to stipulate the number of cores you wa

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-25 Thread Analabha Roy

Hi, Thanks for your considered response. Couple of questions linger... On Sat, 25 Feb 2023 at 21:46, Doug Meyer wrote: > Hi, > > Declaring cores=64 will absolutely work but if you start running MPI > you'll want a more detailed config description. The easy way to read it is > "128=2 sockets *

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-25 Thread Doug Meyer

Hi, Declaring cores=64 will absolutely work but if you start running MPI you'll want a more detailed config description. The easy way to read it is "128=2 sockets * 32 corespersocket * 2 threads per core". NodeName=hpc[306-308] CPUs=128 Sockets=2 CoresPerSocket=32 ThreadsPerCore=2 RealMemory=512

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-23 Thread Analabha Roy

Howdy, and thanks for the warm welcome, On Fri, 24 Feb 2023 at 07:31, Doug Meyer wrote: > Hi, > > Did you configure your node definition with the outputs of slurmd -C? > Ignore boards. Don't know if it is still true but several years ago > declaring boards made things difficult. > > $ slurmd -C

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-23 Thread Doug Meyer

Hi, Did you configure your node definition with the outputs of slurmd -C? Ignore boards. Don't know if it is still true but several years ago declaring boards made things difficult. Also, if you have hyperthreaded AMD or Intel processors your partition declaration should be overscribe:2 Start w

[slurm-users] Single Node cluster. How to manage oversubscribing

2023-02-23 Thread Analabha Roy

Hi folks, I have a single-node "cluster" running Ubuntu 20.04 LTS with the distribution packages for slurm (slurm-wlm 19.05.5) Slurm only ran one job in the node at a time with the default configuration, leaving all other jobs pending. This happened even if that one job only requested like a few c

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

Re: [slurm-users] Single Node cluster. How to manage oversubscribing

[slurm-users] Single Node cluster. How to manage oversubscribing

11 matches

Site Navigation

Mail list logo

Footer information