Re: [slurm-users] Cannot enable Gang scheduling

2023-01-13 Thread Helder Daniel
Thanks for all your Help Kevin, I really did miss the OverSubscribe option in the docs :-( But now cpu job scheduling is working and I have a picture of the problem with gpu job scheduling to dig further :-) On Fri, 13 Jan 2023 at 13:01, Kevin Broch wrote: > Sorry to hear that. Hopefully o

Re: [slurm-users] Cannot enable Gang scheduling

2023-01-13 Thread Kevin Broch
Sorry to hear that. Hopefully others in the group have some ideas/explanations. I haven't had to deal with GPU resources in Slurm. On Fri, Jan 13, 2023 at 4:51 AM Helder Daniel wrote: > Oh, ok. > I guess I was expecting that the GPU job was suspended copying GPU memory > to RAM memory. > > I tr

Re: [slurm-users] Cannot enable Gang scheduling

2023-01-13 Thread Helder Daniel
Oh, ok. I guess I was expecting that the GPU job was suspended copying GPU memory to RAM memory. I tried also: REQUEUE,GANG and CANCEL,GANG. None of these options seems to be able to preempt GPU jobs On Fri, 13 Jan 2023 at 12:30, Kevin Broch wrote: > My guess, is that this isn't possible with

Re: [slurm-users] Cannot enable Gang scheduling

2023-01-13 Thread Helder Daniel
PS: I checked the resources while running the 3 GPU jobs which where launched with: sbatch --gpus-per-task=2 --cpus-per-task=1 cnn-multi.sh The server have 64 cores (32 x2 with hyperthreading) cat /proc/cpuinfo | grep processor | tail -n1 processor : 63 128 GB main memory: hdaniel@asimov:~/Wor

Re: [slurm-users] Cannot enable Gang scheduling

2023-01-13 Thread Kevin Broch
My guess, is that this isn't possible with GANG,SUSPEND. GPU memory isn't managed in Slurm so the idea of suspending GPU memory for another job to use the rest simply isn't possible. On Fri, Jan 13, 2023 at 4:08 AM Helder Daniel wrote: > Hi Kevin > > I did a "scontrol show partition". > Oversub

Re: [slurm-users] Cannot enable Gang scheduling

2023-01-13 Thread Helder Daniel
Hi Kevin I did a "scontrol show partition". Oversubscribe was not enabled. I enable it in slurm.conf with: (...) GresTypes=gpu NodeName=asimov Gres=gpu:4 Sockets=1 CoresPerSocket=32 ThreadsPerCore=2 State=UNKNOWN PartitionName=asimov01 *OverSubscribe=FORCE* Nodes=asimov Default=YES MaxTime=INFINI

Re: [slurm-users] Cannot enable Gang scheduling

2023-01-13 Thread Kevin Broch
Problem might be that OverSubscribe is not enabled? w/o it, I don't believe the time-slicing can be GANG scheduled Can you do a "scontrol show partition" to verify that it is? On Thu, Jan 12, 2023 at 6:24 PM Helder Daniel wrote: > Hi, > > I am trying to enable gang scheduling on a server with

[slurm-users] Cannot enable Gang scheduling

2023-01-12 Thread Helder Daniel
Hi, I am trying to enable gang scheduling on a server with a CPU with 32 cores and 4 GPUs. However, using Gang sched, the cpu jobs (or gpu jobs) are not being preempted after the time slice, which is set to 30 secs. Below is a snapshot of squeue. There are 3 jobs each needing 32 cores. The first