Ralph Castain <r...@open-mpi.org> writes: >> I think there's a problem with documentation at least not being >> explicit, and it would really help to have it clarified unless I'm >> missing some. > > Not quite sure I understand this comment - the problem is that we > aren’t correctly reading the allocation, as evidenced by when the user > ran with —display-allocation. From what we can see, it looks like the > PE_HOSTFILE may be containing some unexpected characters that make us > think we hit EOF at the end of the first line, thus ignoring the > second node.
I suspect that the environment variables Reuti listed are trashed, but without printing the environment and the contents of $PE_HOSTFILE, it's only a guess. But on the face of it (ignoring the diagnostics) the observed "oversubscription" still seems consistent with what documentation there is. I can't see where it says what is the correct behaviour for the mapping without the mpirun command specifying it. >> >> Although there's probably more to it in this case, the behaviour seemed >> consistent with what I deduced (without reading the code) from the doc, >> ompi_info, and experiment that at least wasn't inconsistent: the node >> has 32 processing units, and the default allocation is by socket, >> apparently round-robin within nodes. I can't check the actual behaviour >> in that case just now.