Hi, I am doing a research on parallel techniques for shared-memory systems(NUMA). I understand that OpenMPI is intelligent to utilize shared-memory system and it uses processor-affinity. Is the OpenMPI design of MPI_AllReduce "same" for shared-memory (NUMA) as well as distributed system? Can someone please tell me MPI_AllReduce design, in brief, in terms of processes and their interaction on shared-memory? Else please suggest me a good reference for this.
-Thanks, Sarang.