G'Day all, I've had a question from a user of our new HPC, the following should explain it:
➜ srun -N 1 --cpus-per-task 8 --time 01:00:00 --mem 2g --pty python3 Python 3.6.8 (default, Nov 16 2020, 16:55:22) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.cpu_count() 256 >>> len(os.sched_getaffinity(0)) 256 >>> The output of os.cpu_count() is correct: there are 256 CPUs on the server, but the output of len(os.sched_getaffinity(0)) is still 256 when I was expecting it to be 8 - the number of CPUs this process is restricted to. Is my slurm command incorrect? When I run a similar test on XXXXXX I get the expected behaviour: ➜ qsub -I -l select=1:ncpus=4:mem=1gb qsub: waiting for job 9616042.pbs to start qsub: job 9616042.pbs ready ➜ python3 Python 3.4.10 (default, Dec 13 2019, 16:20:47) [GCC] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.cpu_count() 72 >>> len(os.sched_getaffinity(0)) 4 >>> This seems to be a problem for me as I have a program provided by a third-party company that keeps trying to run with 256 threads and crashes. The program is a compiled binary so I don't know if they're just grabbing the number of CPUs or correctly getting the scheduler affinity, but it seems as though TRI's HPC will return the total number of CPUs in any case. There aren't any options with the program to set the number of threads manually. My question to the group is what's causing this? Do I need a cgroups plugin? I think these are the relevant lines from the slurm.conf file: SelectType=select/cons_res SelectTypeParameters=CR_CPU_Memory ReturnToService=1 CpuFreqGovernors=OnDemand,Performance,UserSpace CpuFreqDef=Performance Sid Young Translational Research Institute