Hi Chris, Christopher Benjamin Coffey <chris.cof...@nau.edu> writes:
> Hi Loris, > >> But that's only the case if the program is started with srun or some >> form of mpirun. Otherwise the program just gets started once on one >> core and the other cores just idle. > > Yes, maybe that’s true about what you say when not using srun. I'm not > sure, as we tell everyone to use srun to launch every type of task. OK, I'm confused now. Our main culprit for producing processes with incorrect affinity is ORCA [1]. It uses OpenMPI but also likes to start processes asynchronously via SSH within the node set. Our users run their jobs via batch files containing, say #SBATCH --ntasks=8 ... $ORCA_PATH/orca ... However, if I run an ORCA job with 'srun', i.e. #SBATCH --ntasks=8 ... srun $ORCA_PATH/orca ... this results in the program being run 8 times with all of them writing to the same log and output files. Is ORCA just a pathological exception to the idea that it's always good to use 'srun'? (As it causes well over 95% of our affinity problems, it is already pathological in that sense.) Cheers, Loris Footnotes: [1] https://orcaforum.cec.mpg.de/ -- Dr. Loris Bennett (Mr.) ZEDAT, Freie Universität Berlin Email loris.benn...@fu-berlin.de