Hi Jeffrey and Antony, Thanks a lot for your valuable help and all the infos. I just tested on my PC according to your instruction while waiting for running jobs on the server to finish. It works perfectly. I tested by setting `SelectTypeParameters=CR_CPU` and configuring `CPUS=` without specifying ` CoresPerSocket=` and `ThreadsPerCore=`. This do gives the expected behavior I am looking for.
Hi Cyrus, Although have not tested on the server yet, I guess the solution above should be working correctly. Thanks! The gres on the server is: Name=gpu Type=gtx1080ti File=/dev/nvidia0 Name=gpu Type=gtx1080ti File=/dev/nvidia1 Name=gpu Type=titanv File=/dev/nvidia2 Name=gpu Type=titanv File=/dev/nvidia3 Name=gpu Type=titanv File=/dev/nvidia4 Name=gpu Type=v100 File=/dev/nvidia5 Name=gpu Type=gp100 File=/dev/nvidia6 Name=gpu Type=gp100 File=/dev/nvidia7 The submission line is: #!/bin/bash #SBATCH --job-name=US_Y285_TTP_GDP #SBATCH --output=test_%j.out #SBATCH --error=test_%j.err #SBATCH --ntasks=1 #SBATCH --nodes=1 #SBATCH --time=600:00:00 #SBATCH --mem-per-cpu=2000 #SBATCH --gres=gpu:1 These just looks normal to me. Xiang Gao Cyrus Proctor <cproc...@tacc.utexas.edu> 于2019年2月8日周五 下午12:40写道: > Xiang, > > From what I've of the original question, gres.conf may be another place to > verify the setup that only one core is being allocated per gpu request: > https://slurm.schedmd.com/gres.conf.html > > Seeing the run submission line and gres.conf might help others give you > further advise. > > To Jeffrey's email: the concept of oversubscription may be beneficial > versus changing resource inventories: > https://slurm.schedmd.com/cons_res_share.html > > Best, > > Cyrus > On 2/8/19 9:44 AM, Jeffrey Frey wrote: > > Documentation for CR_CPU: > > > CR_CPU > CPUs are consumable resources. Configure the number of CPUs on each node, > which may be equal to the count of cores or hyper-threads on the node > depending upon the desired minimum resource allocation. The > node's Boards, Sockets, CoresPerSocket andThreadsPerCore may optionally be > configured and result in job allocations which have improved locality; > *however > doing so will prevent more than one job being from being allocated on each > core.* > > > > So once you're configured node(s) with ThreadsPerCore=N, the cons_res > plugin still forces tasks to span all threads on a core. Elsewhere in the > documentation it is stated: > > > *Note that the Slurm can allocate resources to jobs down to the resolution > of a core.* > > > > So you MUST treat a thread as a core if you want to schedule individual > threads. I can confirm this using the config: > > > SelectTypeParameters = CR_CPU_MEMORY > NodeName=n[003,008] CPUS=16 Sockets=2 CoresPerSocket=4 ThreadsPerCore=2 > > > > Submitting a 1-cpu job, if I check the cpuset assigned to a job on n003: > > > $ cat /sys/fs/cgroup/cpuset/slurm/{uid}/{job}/cpuset.cpus > 4,12 > > > > If I instead configure as: > > > SelectTypeParameters = CR_Core_Memory > NodeName=n[003,008] CPUS=16 Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 > > > > Slurm will schedule "cores" 0-15 to jobs, which the cpuset cgroup happily > accepts. A 1-cpu job then shows: > > > $ cat /sys/fs/cgroup/cpuset/slurm/{uid}/{job}/cpuset.cpus > 2 > > > > and a 2-cpu job shows: > > > $ cat /sys/fs/cgroup/cpuset/slurm/{uid}/{job}/cpuset.cpus > 4,12 > > > > > > > > On Feb 8, 2019, at 5:09 AM, Antony Cleave <antony.cle...@gmail.com> wrote: > > if you want slurm to just ignore the difference between physical and > logical cores then you can change > SelectTypeParameters=CR_Core > to > SelectTypeParameters=CR_CPU > > and then it will treat threads as CPUs and then it will let you start the > number of tasks you expect > > Antony > > On Thu, 7 Feb 2019 at 18:04, Jeffrey Frey <f...@udel.edu> wrote: > Your nodes are hyperthreaded (ThreadsPerCore=2). Slurm always allocates > _all threads_ associated with a selected core to jobs. So you're being > assigned both threads on core N. > > > On our development-partition nodes we configure the threads as cores, e.g. > > > NodeName=moria CPUs=16 Boards=1 SocketsPerBoard=2 CoresPerSocket=8 > ThreadsPerCore=1 > > > to force Slurm to schedule the threads separately. > > > > On Feb 7, 2019, at 12:10 PM, Xiang Gao <qasdfgtyu...@gmail.com> > <qasdfgtyu...@gmail.com> wrote: > > Hi All, > > We configured slurm on a server with 8 GPU and 16 CPUs and want to use > slurm to scheduler for both CPU and GPU jobs. We observed an unexpected > behavior that, although there are 16 CPUs, slurm only schedule 8 jobs to > run even if there are jobs not asking any GPU. If I inspect detailed > information using `scontrol show job`, I see some strange thing on some job > that just ask for 1 CPU: > > NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 > > If I understand these concepts correctly, as the number of nodes is 1, > number of tasks is 1, and number of cpus/task is 1, in principle there is > no way that the final number of CPUs is 2. I'm not sure if I misunderstand > the concepts, configure slurm wrongly, or this is a bug. So I come for help. > > Some related config are: > > # COMPUTE NODES > NodeName=moria CPUs=16 Boards=1 SocketsPerBoard=2 CoresPerSocket=4 > ThreadsPerCore=2 RealMemory=120000 > Gres=gpu:gtx1080ti:2,gpu:titanv:3,gpu:v100:1,gpu:gp100:2 > State=UNKNOWN > PartitionName=queue Nodes=moria Default=YES MaxTime=INFINITE State=UP > > # SCHEDULING > FastSchedule=1 > SchedulerType=sched/backfill > GresTypes=gpu > SelectType=select/cons_res > SelectTypeParameters=CR_Core > > Best, > Xiang Gao > > > > :::::::::::::::::::::::::::::::::::::::::::::::::::::: > Jeffrey T. Frey, Ph.D. > Systems Programmer V / HPC Management > Network & Systems Services / College of Engineering > University of Delaware, Newark DE 19716 > Office: (302) 831-6034 Mobile: (302) 419-4976 > :::::::::::::::::::::::::::::::::::::::::::::::::::::: > > > > > > > :::::::::::::::::::::::::::::::::::::::::::::::::::::: > Jeffrey T. Frey, Ph.D. > Systems Programmer V / HPC Management > Network & Systems Services / College of Engineering > University of Delaware, Newark DE 19716 > Office: (302) 831-6034 Mobile: (302) 419-4976 > :::::::::::::::::::::::::::::::::::::::::::::::::::::: > > > > >