Hi, I am doing a research on parallel computing on shared memory with NUMA architecture. The system is a 4 node AMD opteron with each node being a dual-core. I am testing an OpenMPI program with MPI-nodes <= MAX cores available on system (in my case 4*2=8). Can someone tell me whether: a) In such cases (where MPI-nodes<=MAX cores on shared-memory), OpenMPI implements MPI-nodes as processes or threads? If yes, then how can it be determined at run-time? I am wondering because processes have more overhead than light-weight threads.
-Thanks and Regards, Sarang.