Hello,

I would like to run an instance of my application on every *core* of a small 
cluster. I am using Torque 2.5.12 to run jobs on the cluster. The cluster in 
question is a heterogeneous collection of machines that are all past their 
prime. Specifically, the number of cores ranges from 2-8. Here is the Torque 
"nodes" file:

n0000 np=2
n0001 np=2
n0002 np=8
n0003 np=8
n0004 np=2
n0005 np=2
n0006 np=2
n0007 np=4

When I use openmpi-1.6.3, I can oversubscribe nodes but the tasks are allocated 
to nodes without regard to the number of cores on each node (specified by the 
"np=xx" in the nodes file). For example, when I run "mpirun -np 24 hostname", 
mpirun places three instances of "hostname" on each node, despite the fact that 
some nodes only have two processors and some have more.

Is there a way to have OpenMPI "gracefully" oversubscribe nodes by allocating 
instances based on the "np=xx" information in the Torque nodes file? It this a 
Torque problem?

p.s. I do get the desired behavior when I run *without* Torque and specify the 
following machine file to mpirun:

n0000 slots=2
n0001 slots=2
n0002 slots=8
n0003 slots=8
n0004 slots=2
n0005 slots=2
n0006 slots=2
n0007 slots=4

Regards,

Jason



Reply via email to