Thank you for your suggestion.
I am more concerned about the poor performance of one MPI process/socket
case.
The model fits better for my real workload.
The performance that I see is a lot worse than what the underlying hardware
can support.
The best case (all MPI processes in a single socket) is
My application is behaving correctly on node n006, and incorrectly on the lower
numbered nodes. The flags in the error message below may give a clue as to
why. What is the meaning of the flag values 0x11 and 0x13?
== ALLOCATED NODES ==
n006
I updated the message to explain the flags (instead of a numerical value) for
OMPI v5. In brief:
#define PRRTE_NODE_FLAG_DAEMON_LAUNCHED 0x01 // whether or not the daemon
on this node has been launched
#define PRRTE_NODE_FLAG_LOC_VERIFIED 0x02 // whether or not the
location
Thanks Ralph. So the difference between the working node flag (0x11) and the
non-working nodes’ flags (0x13) is the flag PRRTE_NODE_FLAG_LOC_VERIFIED.
What does that imply? The location of the daemon has NOT been verified?
Kurt
From: users On Behalf Of Ralph Castain via
users
Sent: Mon