Then those flags are correct. I suspect mpirun is executing on n006, yes? The "location verified" just means that the daemon of rank N reported back from the node we expected it to be on - Slurm and Cray sometimes renumber the ranks. Torque doesn't and so you should never see a problem. Since mpirun isn't launched by itself, its node is never "verified", though I probably should alter that as it is obviously in the "right" place.
I don't know what you mean by your app isn't behaving correctly on the remote nodes - best guess is that perhaps some envar they need isn't being forwarded? On Apr 14, 2020, at 2:04 AM, Mccall, Kurt E. (MSFC-EV41) <kurt.e.mcc...@nasa.gov <mailto:kurt.e.mcc...@nasa.gov> > wrote: CentOS, Torque. From: Ralph Castain <r...@open-mpi.org <mailto:r...@open-mpi.org> > Sent: Monday, April 13, 2020 5:44 PM To: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mcc...@nasa.gov <mailto:kurt.e.mcc...@nasa.gov> > Subject: [EXTERNAL] Re: [OMPI users] Meaning of mpiexec error flags What kind of system are you running on? Slurm? Cray? ...? On Apr 13, 2020, at 3:11 PM, Mccall, Kurt E. (MSFC-EV41) <kurt.e.mcc...@nasa.gov <mailto:kurt.e.mcc...@nasa.gov> > wrote: Thanks Ralph. So the difference between the working node flag (0x11) and the non-working nodes’ flags (0x13) is the flagPRRTE_NODE_FLAG_LOC_VERIFIED. What does that imply? The location of the daemon has NOT been verified? Kurt From: users <users-boun...@lists.open-mpi.org <mailto:users-boun...@lists.open-mpi.org> > On Behalf Of Ralph Castain via users Sent: Monday, April 13, 2020 4:47 PM To: Open MPI Users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > Cc: Ralph Castain <r...@open-mpi.org <mailto:r...@open-mpi.org> > Subject: [EXTERNAL] Re: [OMPI users] Meaning of mpiexec error flags I updated the message to explain the flags (instead of a numerical value) for OMPI v5. In brief: #define PRRTE_NODE_FLAG_DAEMON_LAUNCHED 0x01 // whether or not the daemon on this node has been launched #define PRRTE_NODE_FLAG_LOC_VERIFIED 0x02 // whether or not the location has been verified - used for // environments where the daemon's final destination is uncertain #define PRRTE_NODE_FLAG_OVERSUBSCRIBED 0x04 // whether or not this node is oversubscribed #define PRRTE_NODE_FLAG_MAPPED 0x08 // whether we have been added to the current map #define PRRTE_NODE_FLAG_SLOTS_GIVEN 0x10 // the number of slots was specified - used only in non-managed environments #define PRRTE_NODE_NON_USABLE 0x20 // the node is hosting a tool and is NOT to be used for jobs On Apr 13, 2020, at 2:15 PM, Mccall, Kurt E. (MSFC-EV41) via users <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > wrote: My application is behaving correctly on node n006, and incorrectly on the lower numbered nodes. The flags in the error message below may give a clue as to why. What is the meaning of the flag values 0x11 and 0x13? ====================== ALLOCATED NODES ====================== n006: flags=0x11 slots=3 max_slots=0 slots_inuse=2 state=UP n005: flags=0x13 slots=3 max_slots=0 slots_inuse=1 state=UP n004: flags=0x13 slots=3 max_slots=0 slots_inuse=1 state=UP n003: flags=0x13 slots=3 max_slots=0 slots_inuse=1 state=UP n002: flags=0x13 slots=3 max_slots=0 slots_inuse=1 state=UP n001: flags=0x13 slots=3 max_slots=0 slots_inuse=1 state=UP I’m using OpenMPI 4.0.3. Thanks, Kurt