Am 22.11.2013 um 18:56 schrieb Jason Gans:

> On 11/22/13 10:47 AM, Reuti wrote:
>> Hi,
>> 
>> Am 22.11.2013 um 17:32 schrieb Gans, Jason D:
>> 
>>> I would like to run an instance of my application on every *core* of a 
>>> small cluster. I am using Torque 2.5.12 to run jobs on the cluster. The 
>>> cluster in question is a heterogeneous collection of machines that are all 
>>> past their prime. Specifically, the number of cores ranges from 2-8. Here 
>>> is the Torque "nodes" file:
>>> 
>>> n0000 np=2
>>> n0001 np=2
>>> n0002 np=8
>>> n0003 np=8
>>> n0004 np=2
>>> n0005 np=2
>>> n0006 np=2
>>> n0007 np=4
>>> 
>>> When I use openmpi-1.6.3, I can oversubscribe nodes but the tasks are 
>>> allocated to nodes without regard to the number of cores on each node 
>>> (specified by the "np=xx" in the nodes file). For example, when I run 
>>> "mpirun -np 24 hostname", mpirun places three instances of "hostname" on 
>>> each node, despite the fact that some nodes only have two processors and 
>>> some have more.
>> You submitted the job itself by requesting 24 cores for it too?
>> 
>> -- Reuti
> Since there are only 8 Torque nodes in the cluster, I submitted the job by 
> requesting 8 nodes, i.e. "qsub -I -l nodes=8".

No, AFAICT it's necessary to request there 24 too. To investigate it further it 
would also be good to copy the $PBS_NODEFILE in your job script for later 
inspection to your home directory. I.e. whether you are getting the correct 
values there already.

-- Reuti

>> 
>> 
>>> Is there a way to have OpenMPI "gracefully" oversubscribe nodes by 
>>> allocating instances based on the "np=xx" information in the Torque nodes 
>>> file? It this a Torque problem?
>>> 
>>> p.s. I do get the desired behavior when I run *without* Torque and specify 
>>> the following machine file to mpirun:
>>> 
>>> n0000 slots=2
>>> n0001 slots=2
>>> n0002 slots=8
>>> n0003 slots=8
>>> n0004 slots=2
>>> n0005 slots=2
>>> n0006 slots=2
>>> n0007 slots=4
>>> 
>>> Regards,
>>> 
>>> Jason
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to