It looks like NUMA is broken, while others such as SOCKET and L3CACHE work fine. A quick look in opal_hwloc_base_get_relative_locality() and friends tells me that those functions were not properly updated to hwloc 2.0 NUMA changes. I'll try to understand what's going on tomorrow.
Rebuilding OMPI with an external hwloc 1.11.x might avoid the issue in the meantime. Beware that splitting on NUMA might become meaningless on some platforms in the future (there are already some x86 platforms where some NUMA nodes are attached to the Packages while others are attached to each half of the same Packages). Brice Le 26/11/2019 à 23:12, Hatem Elshazly via users a écrit : > Hello, > > > I'm trying to split the world communicator by NUMA using > MPI_Comm_split_type. I expected to get as many sub communicators as > the NUMA nodes, but what I get is as many sub communicator as the > number of mpi processes each containing one process. > > > Attached is a reproducer code. I tried it using version 4.0.2 built > with GNU 9.2.0 on a skyline and haswell machines and both behave > similarly. > > > Can anyone point me to why does it behave like that? Is this expected > or am I confusing something? > > > Thanks in advance, > > Hatem > > Junior Researcher -- Barcelona Supercomputing Center (BSC) > > > > http://bsc.es/disclaimer