I've heard this from a couple of other sources - it looks like there is a 
problem on the daemons when they compute the location for -cpus-per-proc. I'm 
not entirely sure why that would be as the code is supposed to be common with 
mpirun, but there are a few differences.

I will take a look at it - I don't know of any workaround, I'm afraid.

On Mar 21, 2013, at 12:01 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:

> Dear Open MPI Pros
> 
> I am having trouble using mpiexec with --cpus-per-proc
> on multiple nodes in OMPI 1.6.4.
> 
> I know there is an ongoing thread on similar runtime issues
> of OMPI 1.7.
> By no means I am trying to hijack T. Mishima's questions.
> My question is genuine, though, and perhaps related to his.
> 
> I am testing a new cluster remotely, with monster
> dual socket 16-core AMD Bulldozer processors (32 cores per node).
> I am using OMPI 1.6.4 built with Torque 4.2.1 support.
> 
> I read that on these processors each pair of cores share an FPU.
> Hence, I am trying to run *one MPI process* on each
> *pair of successive cores*.
> This trick seems to yield better performance
> (at least for HPL/Linpack) than using all cores.
> I.e., the goal is to use "each other core", or perhaps
> to allow each process to wobble across two successive cores only,
> hence granting exclusive use of one FPU per process.
> [BTW, this is *not* an attempt to do hybrid MPI+OpenMP.
> The code is HPL with MPI+BLAS/Lapack and NO OpenMP.]
> 
> To achieve this, I am using the mpiexec --cpus-per-proc option.
> It works on one node, which is great.
> However, unless I made a silly syntax or arithmetic mistake,
> it doesn't seem to work on more than one node.
> 
> For instance, this works:
> 
> #PBS -l nodes=1:ppn=32
> ...
> mpiexec -np 16 \
>    --cpus-per-proc 2 \
>    --bind-to-core \
>    --report-bindings \
>    --tag-output \
> 
> I get a pretty nice process-to-cores distribution, with 16 processes, and 
> each process bound to a couple of successive cores,
> as expected:
> 
> [1,7]<stderr>:[node33:04744] MCW rank 7 bound to socket 0[core 14-15]: [. . . 
> . . . . . . . . . . . B B][. . . . . . . . . . . . . . . .]
> [1,8]<stderr>:[node33:04744] MCW rank 8 bound to socket 1[core 0-1]: [. . . . 
> . . . . . . . . . . . .][B B . . . . . . . . . . . . . .]
> [1,9]<stderr>:[node33:04744] MCW rank 9 bound to socket 1[core 2-3]: [. . . . 
> . . . . . . . . . . . .][. . B B . . . . . . . . . . . .]
> [1,10]<stderr>:[node33:04744] MCW rank 10 bound to socket 1[core 4-5]: [. . . 
> . . . . . . . . . . . . .][. . . . B B . . . . . . . . . .]
> [1,11]<stderr>:[node33:04744] MCW rank 11 bound to socket 1[core 6-7]: [. . . 
> . . . . . . . . . . . . .][. . . . . . B B . . . . . . . .]
> [1,12]<stderr>:[node33:04744] MCW rank 12 bound to socket 1[core 8-9]: [. . . 
> . . . . . . . . . . . . .][. . . . . . . . B B . . . . . .]
> [1,13]<stderr>:[node33:04744] MCW rank 13 bound to socket 1[core 10-11]: [. . 
> . . . . . . . . . . . . . .][. . . . . . . . . . B B . . . .]
> [1,14]<stderr>:[node33:04744] MCW rank 14 bound to socket 1[core 12-13]: [. . 
> . . . . . . . . . . . . . .][. . . . . . . . . . . . B B . .]
> [1,15]<stderr>:[node33:04744] MCW rank 15 bound to socket 1[core 14-15]: [. . 
> . . . . . . . . . . . . . .][. . . . . . . . . . . . . . B B]
> [1,0]<stderr>:[node33:04744] MCW rank 0 bound to socket 0[core 0-1]: [B B . . 
> . . . . . . . . . . . .][. . . . . . . . . . . . . . . .]
> [1,1]<stderr>:[node33:04744] MCW rank 1 bound to socket 0[core 2-3]: [. . B B 
> . . . . . . . . . . . .][. . . . . . . . . . . . . . . .]
> [1,2]<stderr>:[node33:04744] MCW rank 2 bound to socket 0[core 4-5]: [. . . . 
> B B . . . . . . . . . .][. . . . . . . . . . . . . . . .]
> [1,3]<stderr>:[node33:04744] MCW rank 3 bound to socket 0[core 6-7]: [. . . . 
> . . B B . . . . . . . .][. . . . . . . . . . . . . . . .]
> [1,4]<stderr>:[node33:04744] MCW rank 4 bound to socket 0[core 8-9]: [. . . . 
> . . . . B B . . . . . .][. . . . . . . . . . . . . . . .]
> [1,5]<stderr>:[node33:04744] MCW rank 5 bound to socket 0[core 10-11]: [. . . 
> . . . . . . . B B . . . .][. . . . . . . . . . . . . . . .]
> [1,6]<stderr>:[node33:04744] MCW rank 6 bound to socket 0[core 12-13]: [. . . 
> . . . . . . . . . B B . .][. . . . . . . . . . . . . . . .]
> 
> 
> ***************
> 
> However, when I try to use eight nodes,
> the job fails and I get the error message  below (repeatedly from
> several nodes):
> 
> #PBS -l nodes=8:ppn=32
> ...
> mpiexec -np 128 \
>    --cpus-per-proc 2 \
>    --bind-to-core \
>    --report-bindings \
>    --tag-output \
> 
> 
> Error message:
> 
> --------------------------------------------------------------------------
> An invalid physical processor ID was returned when attempting to bind
> an MPI process to a unique processor on node:
> 
>  Node: node18
> 
> This usually means that you requested binding to more processors than
> exist (e.g., trying to bind N MPI processes to M processors, where N >
> M), or that the node has an unexpectedly different topology.
> 
> Double check that you have enough unique processors for all the
> MPI processes that you are launching on this host, and that all nodes
> have identical topologies.
> 
> You job will now abort.
> --------------------------------------------------------------------------
> 
> Oddly enough, the binding map *is* shown on STDERR,
> and it sounds *correct*, pretty much the same binding map above
> that I get for a single node.
> 
> *****************
> 
> Finally, replacing  "--cpus-per-core 2" by "--npernode 16"
> works to some extent, but doesn't reach my goal.
> I.e., the job doesn't fail, and each node gets 16 MPI
> processes indeed.
> However, it doesn't bind the processes the way I want.
> Regardless of whether I continue to use "--bind-to-core"
> or replace it by "--bind-to-socket"
> all 16 processes on each node always bind to socket 0,
> and nothing goes to socket 1.
> 
> ************
> 
> Is there any simple workaround to this
> (other than using a --rankfile),
> to make --cpus-per-proc work with multiple nodes,
> using "each other core"?
> 
> [Only if it is simple workaround.  I must finish this
> remote test soon.  Otherwise I can revisit this issue later.]
> 
> Thank you,
> Gus Correa
> 
> 
> 
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to