As I said, in the absence of a hostfile, -host assigns ONE slot for each time a 
host is named. So the equivalent hostfile would have "slots=1" to create the 
same pattern as your -host cmd line.


On Oct 3, 2012, at 7:12 AM, Siegmar Gross 
<siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi,
> 
> I thought that "slot" is the smallest manageable entity so that I
> must set "slot=4" for a dual-processor dual-core machine with one
> hardware-thread per core. Today I learned about the new keyword
> "sockets" for a hostfile (I didn't find it in "man orte_hosts").
> How would I specify a system with two dual-core processors so that
> "mpiexec -report-bindings -hostfile host_sunpc0_1 -np 4 
> -cpus-per-proc 2 -bind-to-core hostname" or even
> "mpiexec -report-bindings -hostfile host_sunpc0_1 -np 2 
> -cpus-per-proc 4 -bind-to-core hostname" would work in the same way
> as the commands below.
> 
> tyr fd1026 217 mpiexec -report-bindings -host sunpc0,sunpc1 -np 2 \
>  -cpus-per-proc 4 -bind-to-core hostname
> [sunpc0:11658] MCW rank 0 bound to socket 0[core 0-1]
>  socket 1[core 0-1]: [B B][B B]
> sunpc0
> [sunpc1:00553] MCW rank 1 bound to socket 0[core 0-1]
>  socket 1[core 0-1]: [B B][B B]
> sunpc1
> 
> 
> Thank you very much for your help in advance.
> 
> 
> Kind regards
> 
> Siegmar
> 
> 
> 
>>> I recognized another problem with procecss bindings. The command
>>> works, if I use "-host" and it breaks, if I use "-hostfile" with 
>>> the same machines.
>>> 
>>> tyr fd1026 178 mpiexec -report-bindings -host sunpc0,sunpc1 -np 4 \
>>> -cpus-per-proc 2 -bind-to-core hostname
>>> sunpc1
>>> [sunpc1:00086] MCW rank 1 bound to socket 0[core 0-1]: [B B][. .]
>>> [sunpc1:00086] MCW rank 3 bound to socket 1[core 0-1]: [. .][B B]
>>> sunpc0
>>> [sunpc0:10929] MCW rank 0 bound to socket 0[core 0-1]: [B B][. .]
>>> sunpc0
>>> [sunpc0:10929] MCW rank 2 bound to socket 1[core 0-1]: [. .][B B]
>>> sunpc1
>>> 
>>> 
>> 
>> Yes, this works because you told us there is only ONE slot on each
>> host. As a result, we split the 4 processes across the two hosts
>> (both of which are now oversubscribed), resulting in TWO processes
>> running on each host. Since there are 4 cores on each host, and
>> you asked for 2 cores/process, we can make this work.
>> 
>> 
>>> tyr fd1026 179 cat host_sunpc0_1 
>>> sunpc0 slots=4
>>> sunpc1 slots=4
>>> 
>>> 
>>> tyr fd1026 180 mpiexec -report-bindings -hostfile host_sunpc0_1 -np 4 \
>>> -cpus-per-proc 2 -bind-to-core hostname
>> 
>> And this will of course not work. In your hostfile, you told us there
>> are FOUR slots on each host. Since the default is to map by slot, we
>> correctly mapped all four processes to the first node. We then tried
>> to bind 2 cores for each process, resulting in 8 cores - which is
>> more than you have.
>> 
>> 
>>> --------------------------------------------------------------------------
>>> An invalid physical processor ID was returned when attempting to bind
>>> an MPI process to a unique processor.
>>> 
>>> This usually means that you requested binding to more processors than
>>> exist (e.g., trying to bind N MPI processes to M processors, where N >
>>> M).  Double check that you have enough unique processors for all the
>>> MPI processes that you are launching on this host.
>>> 
>>> You job will now abort.
>>> --------------------------------------------------------------------------
>>> sunpc0
>>> [sunpc0:10964] MCW rank 0 bound to socket 0[core 0-1]: [B B][. .]
>>> sunpc0
>>> [sunpc0:10964] MCW rank 1 bound to socket 1[core 0-1]: [. .][B B]
>>> --------------------------------------------------------------------------
>>> mpiexec was unable to start the specified application as it encountered
>>> an error
>>> on node sunpc0. More information may be available above.
>>> --------------------------------------------------------------------------
>>> 4 total processes failed to start
>>> 
>>> 
>>> Perhaps this error is related to the other errors. Thank you very
>>> much for any help in advance.
>>> 
>>> 
>>> Kind regards
>>> 
>>> Siegmar
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
> 


Reply via email to