As I said, in the absence of a hostfile, -host assigns ONE slot for each time a host is named. So the equivalent hostfile would have "slots=1" to create the same pattern as your -host cmd line.
On Oct 3, 2012, at 7:12 AM, Siegmar Gross <siegmar.gr...@informatik.hs-fulda.de> wrote: > Hi, > > I thought that "slot" is the smallest manageable entity so that I > must set "slot=4" for a dual-processor dual-core machine with one > hardware-thread per core. Today I learned about the new keyword > "sockets" for a hostfile (I didn't find it in "man orte_hosts"). > How would I specify a system with two dual-core processors so that > "mpiexec -report-bindings -hostfile host_sunpc0_1 -np 4 > -cpus-per-proc 2 -bind-to-core hostname" or even > "mpiexec -report-bindings -hostfile host_sunpc0_1 -np 2 > -cpus-per-proc 4 -bind-to-core hostname" would work in the same way > as the commands below. > > tyr fd1026 217 mpiexec -report-bindings -host sunpc0,sunpc1 -np 2 \ > -cpus-per-proc 4 -bind-to-core hostname > [sunpc0:11658] MCW rank 0 bound to socket 0[core 0-1] > socket 1[core 0-1]: [B B][B B] > sunpc0 > [sunpc1:00553] MCW rank 1 bound to socket 0[core 0-1] > socket 1[core 0-1]: [B B][B B] > sunpc1 > > > Thank you very much for your help in advance. > > > Kind regards > > Siegmar > > > >>> I recognized another problem with procecss bindings. The command >>> works, if I use "-host" and it breaks, if I use "-hostfile" with >>> the same machines. >>> >>> tyr fd1026 178 mpiexec -report-bindings -host sunpc0,sunpc1 -np 4 \ >>> -cpus-per-proc 2 -bind-to-core hostname >>> sunpc1 >>> [sunpc1:00086] MCW rank 1 bound to socket 0[core 0-1]: [B B][. .] >>> [sunpc1:00086] MCW rank 3 bound to socket 1[core 0-1]: [. .][B B] >>> sunpc0 >>> [sunpc0:10929] MCW rank 0 bound to socket 0[core 0-1]: [B B][. .] >>> sunpc0 >>> [sunpc0:10929] MCW rank 2 bound to socket 1[core 0-1]: [. .][B B] >>> sunpc1 >>> >>> >> >> Yes, this works because you told us there is only ONE slot on each >> host. As a result, we split the 4 processes across the two hosts >> (both of which are now oversubscribed), resulting in TWO processes >> running on each host. Since there are 4 cores on each host, and >> you asked for 2 cores/process, we can make this work. >> >> >>> tyr fd1026 179 cat host_sunpc0_1 >>> sunpc0 slots=4 >>> sunpc1 slots=4 >>> >>> >>> tyr fd1026 180 mpiexec -report-bindings -hostfile host_sunpc0_1 -np 4 \ >>> -cpus-per-proc 2 -bind-to-core hostname >> >> And this will of course not work. In your hostfile, you told us there >> are FOUR slots on each host. Since the default is to map by slot, we >> correctly mapped all four processes to the first node. We then tried >> to bind 2 cores for each process, resulting in 8 cores - which is >> more than you have. >> >> >>> -------------------------------------------------------------------------- >>> An invalid physical processor ID was returned when attempting to bind >>> an MPI process to a unique processor. >>> >>> This usually means that you requested binding to more processors than >>> exist (e.g., trying to bind N MPI processes to M processors, where N > >>> M). Double check that you have enough unique processors for all the >>> MPI processes that you are launching on this host. >>> >>> You job will now abort. >>> -------------------------------------------------------------------------- >>> sunpc0 >>> [sunpc0:10964] MCW rank 0 bound to socket 0[core 0-1]: [B B][. .] >>> sunpc0 >>> [sunpc0:10964] MCW rank 1 bound to socket 1[core 0-1]: [. .][B B] >>> -------------------------------------------------------------------------- >>> mpiexec was unable to start the specified application as it encountered >>> an error >>> on node sunpc0. More information may be available above. >>> -------------------------------------------------------------------------- >>> 4 total processes failed to start >>> >>> >>> Perhaps this error is related to the other errors. Thank you very >>> much for any help in advance. >>> >>> >>> Kind regards >>> >>> Siegmar >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >