Ralph Castain <r...@open-mpi.org> writes:

>> I think there's a problem with documentation at least not being
>> explicit, and it would really help to have it clarified unless I'm
>> missing some.
>
> Not quite sure I understand this comment - the problem is that we
> aren’t correctly reading the allocation, as evidenced by when the user
> ran with —display-allocation. From what we can see, it looks like the
> PE_HOSTFILE may be containing some unexpected characters that make us
> think we hit EOF at the end of the first line, thus ignoring the
> second node.

I suspect that the environment variables Reuti listed are trashed, but
without printing the environment and the contents of $PE_HOSTFILE, it's
only a guess.

But on the face of it (ignoring the diagnostics) the observed
"oversubscription" still seems consistent with what documentation there
is.  I can't see where it says what is the correct behaviour for the
mapping without the mpirun command specifying it.

>> 
>> Although there's probably more to it in this case, the behaviour seemed
>> consistent with what I deduced (without reading the code) from the doc,
>> ompi_info, and experiment that at least wasn't inconsistent:  the node
>> has 32 processing units, and the default allocation is by socket,
>> apparently round-robin within nodes.  I can't check the actual behaviour
>> in that case just now.

Reply via email to