Hi,

I tried to reproduce the bindings from the following blog
http://blogs.cisco.com/performance/open-mpi-v1-5-processor-affinity-options
on a machine with two dual-core processors and openmpi-1.6.2. I have
ordered the lines and removed the output from "hostname" so that it
is easier to see the bindings.

mpiexec -report-bindings -host sunpc0 -np 4 -bind-to-socket hostname
[sunpc0:05410] MCW rank 0 bound to socket 0[core 0-1]: [B B][. .]
[sunpc0:05410] MCW rank 1 bound to socket 0[core 0-1]: [B B][. .]
[sunpc0:05410] MCW rank 2 bound to socket 1[core 0-1]: [. .][B B]
[sunpc0:05410] MCW rank 3 bound to socket 1[core 0-1]: [. .][B B]

The output is consistent with the illustration in the above blog.
Now I add one more machine.

mpiexec -report-bindings -host sunpc0,sunpc1 -np 4 \
  -bind-to-socket hostname
[sunpc0:06015] MCW rank 0 bound to socket 0[core 0-1]: [B B][. .]
[sunpc1:25543] MCW rank 1 bound to socket 0[core 0-1]: [B B][. .]
[sunpc0:06015] MCW rank 2 bound to socket 0[core 0-1]: [B B][. .]
[sunpc1:25543] MCW rank 3 bound to socket 0[core 0-1]: [B B][. .]

I would have expected the same output as before and not a distribution
of the processes across both nodes. Did I misunderstand the concept
so that the output is correct? When I try "-bysocket" with one
machine, I get once more a consistent output to the above blog.

mpiexec -report-bindings -host sunpc0 -np 4 -bysocket \
  -bind-to-socket hostname
[sunpc0:05451] MCW rank 0 bound to socket 0[core 0-1]: [B B][. .]
[sunpc0:05451] MCW rank 1 bound to socket 1[core 0-1]: [. .][B B]
[sunpc0:05451] MCW rank 2 bound to socket 0[core 0-1]: [B B][. .]
[sunpc0:05451] MCW rank 3 bound to socket 1[core 0-1]: [. .][B B]

However I get once more an unexpected output when I add one more
machine and not the expected output from above.

mpiexec -report-bindings -host sunpc0,sunpc1 -np 4 -bysocket \
  -bind-to-socket hostname
[sunpc0:06130] MCW rank 0 bound to socket 0[core 0-1]: [B B][. .]
[sunpc1:25660] MCW rank 1 bound to socket 0[core 0-1]: [B B][. .]
[sunpc0:06130] MCW rank 2 bound to socket 1[core 0-1]: [. .][B B]
[sunpc1:25660] MCW rank 3 bound to socket 1[core 0-1]: [. .][B B]

I would have expected a distribution of the processes across all
nodes, if I would have used "-bynode" (as in the following example).

mpiexec -report-bindings -host sunpc0,sunpc1 -np 4 -bynode \
  -bind-to-socket hostname
[sunpc0:06171] MCW rank 0 bound to socket 0[core 0-1]: [B B][. .]
[sunpc1:25696] MCW rank 1 bound to socket 0[core 0-1]: [B B][. .]
[sunpc0:06171] MCW rank 2 bound to socket 0[core 0-1]: [B B][. .]
[sunpc1:25696] MCW rank 3 bound to socket 0[core 0-1]: [B B][. .]


Option "-npersocket" doesnt't work, even if I reduce "-npersocket"
to "1". Why doesn't it find any sockets, although the above commands
could find both sockets?

mpiexec -report-bindings -host sunpc0 -np 2 -npersocket 1 hostname
--------------------------------------------------------------------------
Your job has requested a conflicting number of processes for the
application:

App: hostname
number of procs:  2

This is more processes than we can launch under the following
additional directives and conditions:

number of sockets:   0
npersocket:   1

Please revise the conflict and try again.
--------------------------------------------------------------------------


By the way I get the same output if I use Linux instead of Solaris.
I would be grateful if somebody could clarify if I misunderstood the
binding concept or if the binding is wrong if I use more than one
machine. Thank you very much for any comments in advance.


Kind regards

Siegmar

Reply via email to