Hi,

I have a user trying to setup a heterogeneous job with one MPI_COMM_WORLD with 
the following:

==========
#!/bin/bash
#SBATCH --job-name=hetero              
#SBATCH --output=/scratch/cbc/hetero.txt
#SBATCH --time=2:00                    
#SBATCH --workdir=/scratch/cbc          
#SBATCH --cpus-per-task=1 --mem-per-cpu=2g --ntasks=1 -C sb
#SBATCH packjob
#SBATCH --cpus-per-task=1 --mem-per-cpu=1g  --ntasks=1 -C sl
#SBATCH --mail-type=START,END

module load openmpi/3.1.2-gcc-6.2.0

srun --pack-group=0,1 ~/hellompi 
===========


Yet, we get an error: " srun: fatal: Job steps that span multiple components of 
a heterogeneous job are not currently supported". But the docs seem to indicate 
it should work?

IMPORTANT: The ability to execute a single application across more than one job 
allocation does not work with all MPI implementations or Slurm MPI plugins. 
Slurm's ability to execute such an application can be disabled on the entire 
cluster by adding "disable_hetero_steps" to Slurm's SchedulerParameters 
configuration parameter.

By default, the applications launched by a single execution of the srun command 
(even for different components of the heterogeneous job) are combined into one 
MPI_COMM_WORLD with non-overlapping task IDs.

Does this not work with openmpi? If not, which mpi/slurm config will work? We 
have slurm.conf MpiDefault=pmi2 currently. I've tried a modern openmpi, and 
also mpich, and mvapich2.

Any help would be appreciated, thanks!

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
 

Reply via email to