Noam Bernstein wrote:

Hi all - we have a new Nehalem cluster (dual quad core), and SMT is
enabled in the  BIOS (for now).  I do want to do benchmarking on our
applications, obviously, but I was also wondering what happens if I just
set the number of slots to 8 in SGE, and just let things run. It particular, how will things be laid out if I do "mpirun --mca mpi_paffinity_alone 1"?

0, 1, 2, 3, 4, 5, etc.  As usual.

1. Will it be clever enough to schedule each process on its own core,
    and only resort to the second SMT virtual core if I go over 8
    processes per node (dual quad core)?

No. "Clever" is not part of mpi_paffinity_alone semantics. The semantics are 0, 1, 2, 3, etc. What that means with respect to cores, sockets, hardware threads, etc., depends on how your BIOS numbers these things. It could be "good". It could be "bad" (e.g., doubling subscribing a core before moving on to the next one).

2.  If it's not that clever, can I pass a rank file?

Yes.

3. If I do have to do that, what is the mapping between core numbers
     and processor/core/SMT virtual cores?

Depends on your BIOS, I think. Take a look at /proc/cpuinfo. Here is one example:

$ grep "physical id" /proc/cpuinfo
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
physical id     : 0
physical id     : 1
$ grep "core id" /proc/cpuinfo
core id         : 0
core id         : 0
core id         : 1
core id         : 1
core id         : 2
core id         : 2
core id         : 3
core id         : 3
core id         : 0
core id         : 0
core id         : 1
core id         : 1
core id         : 2
core id         : 2
core id         : 3
core id         : 3

In this case, sequential binding takes you round-robin between the sockets (physical id), on each socket loading up the cores. Only after the first 8 do you revisit cores. So, that's a "good" numbering.

Starting in OMPI 1.3.4, there is "improved" binding support, but it's not aware of hardware threads. If you're okay using only one thread per core, that may be fine for you. You could run with "mpirun -bysocket -bind-to-socket". If you need to use more than one thread per core, however, that won't do the job for you. You'd have to use rankfiles or something.

Reply via email to