I don't off-hand see a problem, though I do note that your "working" version incorrectly reports the universe size as 2!
I'll have to take a look at this and get back to you on it. On Feb 27, 2013, at 3:15 PM, Reuti <re...@staff.uni-marburg.de> wrote: > Hi, > > I have an issue using the option -cpus-per-proc 2. As I have Bulldozer > machines and I want only one process per FP core, I thought using > -cpus-per-proc 2 would be the way to go. Initially I had this issue inside > GridEngine but then tried it outside any queuingsystem and face exactly the > same behavior. > > @) Each machine has 4 CPUs with each having 16 integer cores, hence 64 > integer cores per machine in total. Used Open MPI is 1.6.4. > > > a) mpiexec -cpus-per-proc 2 -report-bindings -hostfile machines -np 64 > ./mpihello > > and a hostfile containing only the two lines listing the machines: > > node006 > node007 > > This works as I would like it (see working.txt) when initiated on node006. > > > b) mpiexec -cpus-per-proc 2 -report-bindings -hostfile machines -np 64 > ./mpihello > > But changing the hostefile so that it is having a slot count which might > mimic the behavior in case of a parsed machinefile out of any queuing system: > > node006 slots=64 > node007 slots=64 > > This fails with: > > -------------------------------------------------------------------------- > An invalid physical processor ID was returned when attempting to bind > an MPI process to a unique processor on node: > > Node: node006 > > This usually means that you requested binding to more processors than > exist (e.g., trying to bind N MPI processes to M processors, where N > > M), or that the node has an unexpectedly different topology. > > Double check that you have enough unique processors for all the > MPI processes that you are launching on this host, and that all nodes > have identical topologies. > > You job will now abort. > -------------------------------------------------------------------------- > > (see failed.txt) > > > b1) mpiexec -cpus-per-proc 2 -report-bindings -hostfile machines -np 32 > ./mpihello > > This works and the found universe is 128 as expected (see only32.txt). > > > c) Maybe the used machinefile is not parsed in the correct way, so I checked: > > c1) mpiexec -hostfile machines -np 64 ./mpihello => works > > c2) mpiexec -hostfile machines -np 128 ./mpihello => works > > c3) mpiexec -hostfile machines -np 129 ./mpihello => fails as expected > > So, it got the slot counts in the correct way. > > What do I miss? > > -- Reuti > > <failed.txt><only32.txt><working.txt>_______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users