Thanks for responding. Here are some more details. I'm using OpenMpi 4.0.2, compiled with the Portland Group Compiler, pgc++ 19.5-0, with the build flags
--enable-mpi-cxx --enable-cxx-exceptions --with-tm PBS/Torque version is 5.1.1. I launched the job with qsub: qsub -V -j oe -e ./stdio -o ./stdio -f -X -N MyJob -l nodes=2:ppn=3 RunMyJob.bash My abbreviated mpiexec command within RunMyJob.bash was: mpiexec --enable-recovery -display-map --display-allocation --mca mpi_param_check 1 --v --x DISPLAY --np 2 --map-by ppr:1:node <myExecutable> -----Original Message----- From: Peter Kjellström <c...@nsc.liu.se> Sent: Thursday, November 21, 2019 3:40 AM To: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mcc...@nasa.gov> Cc: users@lists.open-mpi.org Subject: [EXTERNAL] Re: [OMPI users] Please help me interpret MPI output On Wed, 20 Nov 2019 17:38:19 +0000 "Mccall, Kurt E. \(MSFC-EV41\) via users" <users@lists.open-mpi.org> wrote: > Hi, > > My job is behaving differently on its two nodes, refusing to > MPI_Comm_spawn() a process on one of them but succeeding on the other. ... > Data for node: n002 Num slots: 3 ... Bound: N/A > Data for node: n001 Num slots: 3 ... Bound: > socket 0[core 0[hwt 0]]:[B/././././././././.][./././././././././.] ... > Why is the Bound output different between n001 and n002? Without knowing more details (like what exact openmpi, how exactly did you try to launch) etc. you're not likely to get good answers. But it does seem clear that the process/rank to hardware (core) pinning happened on one but not the other node. This suggests a broken install and/or enviroment and/or non-standard launch. /Peter K