Re: [OMPI users] mpi_paffinity_alone and Nehalem SMT

Eugene Loh Fri, 23 Oct 2009 12:26:41 -0400

Noam Bernstein wrote:

Hi all - we have a new Nehalem cluster (dual quad core), and SMT is
enabled in the  BIOS (for now).  I do want to do benchmarking on our
applications, obviously, but I was also wondering what happens if I just
set the number of slots to 8 in SGE, and just let things run. Itparticular,how will things be laid out if I do "mpirun --mca mpi_paffinity_alone1"?


0, 1, 2, 3, 4, 5, etc.  As usual.

1. Will it be clever enough to schedule each process on its own core,
    and only resort to the second SMT virtual core if I go over 8
    processes per node (dual quad core)?

No. "Clever" is not part of mpi_paffinity_alone semantics. Thesemantics are 0, 1, 2, 3, etc. What that means with respect to cores,sockets, hardware threads, etc., depends on how your BIOS numbers thesethings. It could be "good". It could be "bad" (e.g., doublingsubscribing a core before moving on to the next one).

2.  If it's not that clever, can I pass a rank file?


Yes.

3. If I do have to do that, what is the mapping between core numbers
     and processor/core/SMT virtual cores?

Depends on your BIOS, I think. Take a look at /proc/cpuinfo. Here isone example:


$ grep "physical id" /proc/cpuinfo
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
$ grep "core id" /proc/cpuinfo
core id         : 0
core id         : 0
core id         : 1
core id         : 1
core id         : 2
core id         : 2
core id         : 3
core id         : 3
core id         : 0
core id         : 0
core id         : 1
core id         : 1
core id         : 2
core id         : 2
core id         : 3
core id         : 3

In this case, sequential binding takes you round-robin between thesockets (physical id), on each socket loading up the cores. Only afterthe first 8 do you revisit cores. So, that's a "good" numbering.

Starting in OMPI 1.3.4, there is "improved" binding support, but it'snot aware of hardware threads. If you're okay using only one thread percore, that may be fine for you. You could run with "mpirun -bysocket-bind-to-socket". If you need to use more than one thread per core,however, that won't do the job for you. You'd have to use rankfiles orsomething.

Re: [OMPI users] mpi_paffinity_alone and Nehalem SMT

Reply via email to