It looks like NUMA is broken, while others such as SOCKET and L3CACHE
work fine. A quick look in opal_hwloc_base_get_relative_locality() and
friends tells me that those functions were not properly updated to hwloc
2.0 NUMA changes. I'll try to understand what's going on tomorrow.

Rebuilding OMPI with an external hwloc 1.11.x might avoid the issue in
the meantime.

Beware that splitting on NUMA might become meaningless on some platforms
in the future (there are already some x86 platforms where some NUMA
nodes are attached to the Packages while others are attached to each
half of the same Packages).

Brice


Le 26/11/2019 à 23:12, Hatem Elshazly via users a écrit :
> Hello,
>
>
> I'm trying to split the world communicator by NUMA using
> MPI_Comm_split_type. I expected to get as many sub communicators as
> the NUMA nodes, but what I get is as many sub communicator as the
> number of mpi processes each containing one process.
>
>
> Attached is a reproducer code. I tried it using version 4.0.2 built
> with GNU 9.2.0 on a skyline and haswell machines and both behave
> similarly.
>
>
> Can anyone point me to why does it behave like that? Is this expected
> or am I confusing something?
>
>
> Thanks in advance,
>
> Hatem
>
> Junior Researcher -- Barcelona Supercomputing Center (BSC)
>
>
>
> http://bsc.es/disclaimer

Reply via email to