I have tried the 1.7 series (specifically 1.7.3) and I get the same behavior.

When I run "mpirun -oversubscribe -np 24 hostname", three instances of 
"hostname" are run on each node.

The contents of the $PBS_NODEFILE are:
n0007
n0006
n0005
n0004
n0003
n0002
n0001
n0000

but, since I have compiled OpenMPI using the "--with-tm",  it appears that 
OpenMPI is not using the $PBS_NODEFILE (which I tested by modifying the torque 
pbs_mom to write a $PBS_NODEFILE that contained "slot=xx" information for each 
node. mpirun complained when I did this).

Regards,

Jason

________________________________
From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain 
[r...@open-mpi.org]
Sent: Friday, November 22, 2013 11:04 AM
To: Open MPI Users
Subject: Re: [OMPI users] Oversubscription of nodes with Torque and OpenMPI

Really shouldn't matter - this is clearly a bug in OMPI if it is doing mapping 
as you describe. Out of curiosity, have you tried the 1.7 series? Does it 
behave the same?

I can take a look at the code later today and try to figure out what happened.

On Nov 22, 2013, at 9:56 AM, Jason Gans <jg...@lanl.gov<mailto:jg...@lanl.gov>> 
wrote:

On 11/22/13 10:47 AM, Reuti wrote:
Hi,

Am 22.11.2013 um 17:32 schrieb Gans, Jason D:

I would like to run an instance of my application on every *core* of a small 
cluster. I am using Torque 2.5.12 to run jobs on the cluster. The cluster in 
question is a heterogeneous collection of machines that are all past their 
prime. Specifically, the number of cores ranges from 2-8. Here is the Torque 
"nodes" file:

n0000 np=2
n0001 np=2
n0002 np=8
n0003 np=8
n0004 np=2
n0005 np=2
n0006 np=2
n0007 np=4

When I use openmpi-1.6.3, I can oversubscribe nodes but the tasks are allocated 
to nodes without regard to the number of cores on each node (specified by the 
"np=xx" in the nodes file). For example, when I run "mpirun -np 24 hostname", 
mpirun places three instances of "hostname" on each node, despite the fact that 
some nodes only have two processors and some have more.
You submitted the job itself by requesting 24 cores for it too?

-- Reuti
Since there are only 8 Torque nodes in the cluster, I submitted the job by 
requesting 8 nodes, i.e. "qsub -I -l nodes=8".


Is there a way to have OpenMPI "gracefully" oversubscribe nodes by allocating 
instances based on the "np=xx" information in the Torque nodes file? It this a 
Torque problem?

p.s. I do get the desired behavior when I run *without* Torque and specify the 
following machine file to mpirun:

n0000 slots=2
n0001 slots=2
n0002 slots=8
n0003 slots=8
n0004 slots=2
n0005 slots=2
n0006 slots=2
n0007 slots=4

Regards,

Jason



_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to