Thank you! We recently converted from pbs, and I was converting “ppn=X” to “-n X”. Does it make more sense to convert “ppn=X” to --“cpus-per-task=X”?
Thanks again David On Thu, Mar 24, 2022 at 3:54 PM Thomas M. Payerle <paye...@umd.edu> wrote: > Although all three cases ( "-N 1 --cpus-per-task 64 -n 1", "-N 1 > --cpus-per-task 1 -n 64", and "-N 1 --cpus-per-task 32 -n 2") will cause > Slurm to allocate 64 cores to the job, there can (and will) be differences > in the other respects. > > The variable SLURM_NTASKS will be set to the argument of the -n (aka > --ntasks) argument, and other Slurm variables will differ as well. > > More importantly, as others noted, srun will launch $SLURM_NTASKS > processes. The mpirun/mpiexec/etc binaries of most MPI libraries will (if > compiled with support for Slurm) act similarly (and indeed, I believe most > use srun under the hood). > > If you are just using sbatch and launching a single process using 64 > threads, then the different options are probably equivalent for most intent > and purposes. Similar if you are doing a loop to start 64 single threaded > processes. But those are simplistic cases, and just happen to "work" even > though you are "abusing" the scheduler options. And even the cases wherein > it "works" is subject to unexpected failures (e.g. if one substitutes srun > for sbatch). > > The differences are most clear when the -N 1 flag is not given. > Generally, SLURM_NTASKS should be the number of MPI or similar tasks you > intend to start. By default, it is assumed the tasks can support > distributed memory parallelism, so the scheduler by default assumes that it > can launch tasks on different nodes (the -N 1 flag you mentioned would > override that). Each such task is assumed to need --cpus-per-task cores > which the scheduler assumes needs shared memory parallelism (i.e. must be > on the same node). > So without the -N 1, "--cpus-per-task 64 -n 1" will require 64 cores on a > single node, whereas "-n 64 --cpus-per-task 1" can result in the job being > assigned 64 cores on a single node to a single core on 64 nodes or any > combination in between with 64 cores. The "--cpus-per-task 32 -n 2" will > either assign one node with 64 cores or 2 nodes with 32 cores each. > > As I said, although there are some simple cases where the different cases > are mostly functionally equivalent, I would recommend trying to use the > proper arguments --- "abusing" the arguments might work for a while but > will likely bite you in the end. E.g., the 64 thread case should do > "--cpus-per-task 64", and the launching processes in the loop should > _probably_ do "-n 64" (assuming it can handle the tasks being assigned to > different nodes). > > On Thu, Mar 24, 2022 at 3:35 PM David Henkemeyer < > david.henkeme...@gmail.com> wrote: > >> Assuming -N is 1 (meaning, this job needs only one node), then is there a >> difference between any of these 3 flag combinations: >> >> -n 64 (leaving cpus-per-task to be the default of 1) >> --cpus-per-task 64 (leaving -n to be the default of 1) >> --cpus-per-task 32 -n 2 >> >> As far as I can tell, there is no functional difference. But if there is >> even a subtle difference, I would love to know what it is! >> >> Thanks >> David >> -- >> Sent from Gmail Mobile >> > > > -- > Tom Payerle > DIT-ACIGS/Mid-Atlantic Crossroads paye...@umd.edu > 5825 University Research Park (301) 405-6135 > University of Maryland > College Park, MD 20740-3831 > -- Sent from Gmail Mobile