Re: [OMPI users] Option -cpus-per-proc 2 not working with given machinefile?

Reuti Thu, 28 Feb 2013 09:18:03 -0500

Am 28.02.2013 um 08:58 schrieb Reuti:

> Am 28.02.2013 um 06:55 schrieb Ralph Castain:
> 
>> I don't off-hand see a problem, though I do note that your "working" version 
>> incorrectly reports the universe size as 2!
> 
> Yes, it was 2 in the case when it was working by giving only two hostnames 
> without any dedicated slot count. What should it be in this case - "unknown", 
> "infinity"?


As an add on:

a) I tried it again on the command line and still get:

Total: 64
Universe: 2

with a hostfile

node006
node007


b) In a job script under SGE and Open MPI compiled --with-sge I get after 
mangling the hostfile:

#!/bin /sh
#$ -pe openmpi* 128
#$ -l exclusive
cut -f 1 -d" " $PE_HOSTFILE > $TMPDIR/machines
mpiexec -cpus-per-proc 2 -report-bindings -hostfile $TMPDIR/machines -np 64 
./mpihello

Here:

Total: 64
Universe: 128

Maybe the found allocation by SGE and the one from the command line argument 
are getting mixed here.

-- Reuti


> -- Reuti
> 
> 
>> 
>> I'll have to take a look at this and get back to you on it.
>> 
>> On Feb 27, 2013, at 3:15 PM, Reuti <re...@staff.uni-marburg.de> wrote:
>> 
>>> Hi,
>>> 
>>> I have an issue using the option -cpus-per-proc 2. As I have Bulldozer 
>>> machines and I want only one process per FP core, I thought using 
>>> -cpus-per-proc 2 would be the way to go. Initially I had this issue inside 
>>> GridEngine but then tried it outside any queuingsystem and face exactly the 
>>> same behavior.
>>> 
>>> @) Each machine has 4 CPUs with each having 16 integer cores, hence 64 
>>> integer cores per machine in total. Used Open MPI is 1.6.4.
>>> 
>>> 
>>> a) mpiexec -cpus-per-proc 2 -report-bindings -hostfile machines -np 64 
>>> ./mpihello
>>> 
>>> and a hostfile containing only the two lines listing the machines:
>>> 
>>> node006
>>> node007
>>> 
>>> This works as I would like it (see working.txt) when initiated on node006.
>>> 
>>> 
>>> b) mpiexec -cpus-per-proc 2 -report-bindings -hostfile machines -np 64 
>>> ./mpihello
>>> 
>>> But changing the hostefile so that it is having a slot count which might 
>>> mimic the behavior in case of a parsed machinefile out of any queuing 
>>> system:
>>> 
>>> node006 slots=64
>>> node007 slots=64
>>> 
>>> This fails with:
>>> 
>>> --------------------------------------------------------------------------
>>> An invalid physical processor ID was returned when attempting to bind
>>> an MPI process to a unique processor on node:
>>> 
>>> Node: node006
>>> 
>>> This usually means that you requested binding to more processors than
>>> exist (e.g., trying to bind N MPI processes to M processors, where N >
>>> M), or that the node has an unexpectedly different topology.
>>> 
>>> Double check that you have enough unique processors for all the
>>> MPI processes that you are launching on this host, and that all nodes
>>> have identical topologies.
>>> 
>>> You job will now abort.
>>> --------------------------------------------------------------------------
>>> 
>>> (see failed.txt)
>>> 
>>> 
>>> b1) mpiexec -cpus-per-proc 2 -report-bindings -hostfile machines -np 32 
>>> ./mpihello
>>> 
>>> This works and the found universe is 128 as expected (see only32.txt).
>>> 
>>> 
>>> c) Maybe the used machinefile is not parsed in the correct way, so I 
>>> checked:
>>> 
>>> c1) mpiexec -hostfile machines -np 64 ./mpihello => works
>>> 
>>> c2) mpiexec -hostfile machines -np 128 ./mpihello => works
>>> 
>>> c3) mpiexec -hostfile machines -np 129 ./mpihello => fails as expected
>>> 
>>> So, it got the slot counts in the correct way.
>>> 
>>> What do I miss?
>>> 
>>> -- Reuti
>>> 
>>> <failed.txt><only32.txt><working.txt>_______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Option -cpus-per-proc 2 not working with given machinefile?

Reply via email to