Hi, I tried to reproduce the bindings from the following blog http://blogs.cisco.com/performance/open-mpi-v1-5-processor-affinity-options on a machine with two dual-core processors and openmpi-1.6.2. I have ordered the lines and removed the output from "hostname" so that it is easier to see the bindings.
mpiexec -report-bindings -host sunpc0 -np 4 -bind-to-socket hostname [sunpc0:05410] MCW rank 0 bound to socket 0[core 0-1]: [B B][. .] [sunpc0:05410] MCW rank 1 bound to socket 0[core 0-1]: [B B][. .] [sunpc0:05410] MCW rank 2 bound to socket 1[core 0-1]: [. .][B B] [sunpc0:05410] MCW rank 3 bound to socket 1[core 0-1]: [. .][B B] The output is consistent with the illustration in the above blog. Now I add one more machine. mpiexec -report-bindings -host sunpc0,sunpc1 -np 4 \ -bind-to-socket hostname [sunpc0:06015] MCW rank 0 bound to socket 0[core 0-1]: [B B][. .] [sunpc1:25543] MCW rank 1 bound to socket 0[core 0-1]: [B B][. .] [sunpc0:06015] MCW rank 2 bound to socket 0[core 0-1]: [B B][. .] [sunpc1:25543] MCW rank 3 bound to socket 0[core 0-1]: [B B][. .] I would have expected the same output as before and not a distribution of the processes across both nodes. Did I misunderstand the concept so that the output is correct? When I try "-bysocket" with one machine, I get once more a consistent output to the above blog. mpiexec -report-bindings -host sunpc0 -np 4 -bysocket \ -bind-to-socket hostname [sunpc0:05451] MCW rank 0 bound to socket 0[core 0-1]: [B B][. .] [sunpc0:05451] MCW rank 1 bound to socket 1[core 0-1]: [. .][B B] [sunpc0:05451] MCW rank 2 bound to socket 0[core 0-1]: [B B][. .] [sunpc0:05451] MCW rank 3 bound to socket 1[core 0-1]: [. .][B B] However I get once more an unexpected output when I add one more machine and not the expected output from above. mpiexec -report-bindings -host sunpc0,sunpc1 -np 4 -bysocket \ -bind-to-socket hostname [sunpc0:06130] MCW rank 0 bound to socket 0[core 0-1]: [B B][. .] [sunpc1:25660] MCW rank 1 bound to socket 0[core 0-1]: [B B][. .] [sunpc0:06130] MCW rank 2 bound to socket 1[core 0-1]: [. .][B B] [sunpc1:25660] MCW rank 3 bound to socket 1[core 0-1]: [. .][B B] I would have expected a distribution of the processes across all nodes, if I would have used "-bynode" (as in the following example). mpiexec -report-bindings -host sunpc0,sunpc1 -np 4 -bynode \ -bind-to-socket hostname [sunpc0:06171] MCW rank 0 bound to socket 0[core 0-1]: [B B][. .] [sunpc1:25696] MCW rank 1 bound to socket 0[core 0-1]: [B B][. .] [sunpc0:06171] MCW rank 2 bound to socket 0[core 0-1]: [B B][. .] [sunpc1:25696] MCW rank 3 bound to socket 0[core 0-1]: [B B][. .] Option "-npersocket" doesnt't work, even if I reduce "-npersocket" to "1". Why doesn't it find any sockets, although the above commands could find both sockets? mpiexec -report-bindings -host sunpc0 -np 2 -npersocket 1 hostname -------------------------------------------------------------------------- Your job has requested a conflicting number of processes for the application: App: hostname number of procs: 2 This is more processes than we can launch under the following additional directives and conditions: number of sockets: 0 npersocket: 1 Please revise the conflict and try again. -------------------------------------------------------------------------- By the way I get the same output if I use Linux instead of Solaris. I would be grateful if somebody could clarify if I misunderstood the binding concept or if the binding is wrong if I use more than one machine. Thank you very much for any comments in advance. Kind regards Siegmar