Hi all, there seems to be a host order-dependent timing issue. The issue occurs when a set of processes is placed on the same node. Mpirun of the job exits at MPI_Init() with:
num local peers failed --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS Non-MPI applications launch just fine, such as: $ mpirun -np 36 --hostfile job_4.machines --mca rmaps seq --map-by node --bind-to none --mca btl_openib_allow_ib 1 /usr/bin/hostname The error with "num local peers failed" happens with already a simple MPI program that simply invokes MPI_Init(), e.g., $ mpirun -np 36 --hostfile job_4.machines --mca rmaps seq --map-by node --bind-to none --mca btl_openib_allow_ib 1 /cluster/testing/mpihelloworld num local peers failed --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS There is zero documentation in the error and I don't know how to work around this... The error occurs in OpenMPI 4.1.1 and 4.0.4. We have not tried other versions yet. Curiously, if the hostfile is sorted first, MPI_Init() will always succeed, e.g., $ sort job_4.machines > job_4.machines-sorted $ mpirun -np 36 --hostfile clock_4.machines-sorted --mca rmaps seq --map-by node --bind-to none --mca btl_openib_allow_ib 1 /cluster/difx/DiFX-trunk_64/bin/mpihelloworld (--> success) My guess is when two instances of the same node are in the hostfile, and the instances are too wide apart (too many other nodes listed in between), then MPI_Init() of one of the instances might be checking much too soon for the other instance? Alas we have a heterogenous cluster where rank-to-node mapping is critical. Does OpenMPI 4.1 have any "grace time" parameter or similar, which would allow processes to wait a bit longer for the expected other instance(s) to eventually come up on the same node? many thanks, Jan