[OMPI users] Communicator Split Type NUMA Behavior

2019-11-26 Thread Hatem Elshazly via users

Hello,


I'm trying to split the world communicator by NUMA using 
MPI_Comm_split_type. I expected to get as many sub communicators as the 
NUMA nodes, but what I get is as many sub communicator as the number of 
mpi processes each containing one process.



Attached is a reproducer code. I tried it using version 4.0.2 built with 
GNU 9.2.0 on a skyline and haswell machines and both behave similarly.



Can anyone point me to why does it behave like that? Is this expected or 
am I confusing something?



Thanks in advance,

Hatem

Junior Researcher -- Barcelona Supercomputing Center (BSC)



http://bsc.es/disclaimer#include 
#include "mpi.h"



int main(){
MPI_Init(NULL, NULL);

int world_size, shared_size, numa_size;
int world_rank, shared_rank, numa_rank;
int key = 0;
MPI_Comm shared_comm, numa_comm;

MPI_Comm_size(MPI_COMM_WORLD, &world_size);
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, key, MPI_INFO_NULL, &shared_comm);
MPI_Comm_size(shared_comm, &shared_size);
MPI_Comm_rank(shared_comm, &shared_rank);

MPI_Comm_split_type(MPI_COMM_WORLD, OMPI_COMM_TYPE_NUMA, key, MPI_INFO_NULL, &numa_comm);
MPI_Comm_size(numa_comm, &numa_size);
MPI_Comm_rank(numa_comm, &numa_rank);

if(world_rank == 0){
   printf(" WORLD Communicator size: %d\n", world_size);
}


if(shared_rank == 0){
  printf(" SHARED Communicator size: %d\n", shared_size);
}


if(numa_rank == 0){
  printf(" NUMA Communicator size: %d\n", numa_size);
}

MPI_Finalize();

return 0;
}


Re: [OMPI users] Communicator Split Type NUMA Behavior

2019-11-26 Thread Brice Goglin via users
It looks like NUMA is broken, while others such as SOCKET and L3CACHE
work fine. A quick look in opal_hwloc_base_get_relative_locality() and
friends tells me that those functions were not properly updated to hwloc
2.0 NUMA changes. I'll try to understand what's going on tomorrow.

Rebuilding OMPI with an external hwloc 1.11.x might avoid the issue in
the meantime.

Beware that splitting on NUMA might become meaningless on some platforms
in the future (there are already some x86 platforms where some NUMA
nodes are attached to the Packages while others are attached to each
half of the same Packages).

Brice


Le 26/11/2019 à 23:12, Hatem Elshazly via users a écrit :
> Hello,
>
>
> I'm trying to split the world communicator by NUMA using
> MPI_Comm_split_type. I expected to get as many sub communicators as
> the NUMA nodes, but what I get is as many sub communicator as the
> number of mpi processes each containing one process.
>
>
> Attached is a reproducer code. I tried it using version 4.0.2 built
> with GNU 9.2.0 on a skyline and haswell machines and both behave
> similarly.
>
>
> Can anyone point me to why does it behave like that? Is this expected
> or am I confusing something?
>
>
> Thanks in advance,
>
> Hatem
>
> Junior Researcher -- Barcelona Supercomputing Center (BSC)
>
>
>
> http://bsc.es/disclaimer