Thanks for responding.   Here are some more details.   I'm using OpenMpi 4.0.2, 
compiled with the Portland Group Compiler, pgc++ 19.5-0, with the build flags

--enable-mpi-cxx  --enable-cxx-exceptions  --with-tm

PBS/Torque version is 5.1.1.

I launched the job with qsub:

qsub -V -j oe -e ./stdio -o ./stdio -f -X -N MyJob -l nodes=2:ppn=3  
RunMyJob.bash

My abbreviated mpiexec command within RunMyJob.bash was:

mpiexec --enable-recovery -display-map --display-allocation --mca 
mpi_param_check 1  --v --x DISPLAY --np 2  --map-by ppr:1:node <myExecutable>


-----Original Message-----
From: Peter Kjellström <c...@nsc.liu.se> 
Sent: Thursday, November 21, 2019 3:40 AM
To: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mcc...@nasa.gov>
Cc: users@lists.open-mpi.org
Subject: [EXTERNAL] Re: [OMPI users] Please help me interpret MPI output

On Wed, 20 Nov 2019 17:38:19 +0000
"Mccall, Kurt E. \(MSFC-EV41\) via users" <users@lists.open-mpi.org>
wrote:

> Hi,
> 
> My job is behaving differently on its two nodes, refusing to
> MPI_Comm_spawn() a process on one of them but succeeding on the other.
...
> Data for node: n002    Num slots: 3    ... Bound: N/A
> Data for node: n001    Num slots: 3    ... Bound:
> socket 0[core 0[hwt 0]]:[B/././././././././.][./././././././././.]
...
> Why is the Bound output different between n001 and n002?

Without knowing more details (like what exact openmpi, how exactly did you try 
to launch) etc. you're not likely to get good answers.

But it does seem clear that the process/rank to hardware (core) pinning 
happened on one but not the other node.

This suggests a broken install and/or enviroment and/or non-standard launch.

/Peter K

Reply via email to