On Nov 22, 2013, at 10:03 AM, Reuti <re...@staff.uni-marburg.de> wrote:
> Am 22.11.2013 um 18:56 schrieb Jason Gans: > >> On 11/22/13 10:47 AM, Reuti wrote: >>> Hi, >>> >>> Am 22.11.2013 um 17:32 schrieb Gans, Jason D: >>> >>>> I would like to run an instance of my application on every *core* of a >>>> small cluster. I am using Torque 2.5.12 to run jobs on the cluster. The >>>> cluster in question is a heterogeneous collection of machines that are all >>>> past their prime. Specifically, the number of cores ranges from 2-8. Here >>>> is the Torque "nodes" file: >>>> >>>> n0000 np=2 >>>> n0001 np=2 >>>> n0002 np=8 >>>> n0003 np=8 >>>> n0004 np=2 >>>> n0005 np=2 >>>> n0006 np=2 >>>> n0007 np=4 >>>> >>>> When I use openmpi-1.6.3, I can oversubscribe nodes but the tasks are >>>> allocated to nodes without regard to the number of cores on each node >>>> (specified by the "np=xx" in the nodes file). For example, when I run >>>> "mpirun -np 24 hostname", mpirun places three instances of "hostname" on >>>> each node, despite the fact that some nodes only have two processors and >>>> some have more. >>> You submitted the job itself by requesting 24 cores for it too? >>> >>> -- Reuti >> Since there are only 8 Torque nodes in the cluster, I submitted the job by >> requesting 8 nodes, i.e. "qsub -I -l nodes=8". > > No, AFAICT it's necessary to request there 24 too. To investigate it further > it would also be good to copy the $PBS_NODEFILE in your job script for later > inspection to your home directory. I.e. whether you are getting the correct > values there already. Not really - we take the number of slots on each node and add them together. Question: is that a copy/paste of the actual PBS_NODEFILE? It doesn't look right to me - there is supposed to be one node entry for each slot. In other words, it should have looked like this: > n0000 > n0000 > n0001 > n0001 > n0002 > n0002 ... > > -- Reuti > >>> >>> >>>> Is there a way to have OpenMPI "gracefully" oversubscribe nodes by >>>> allocating instances based on the "np=xx" information in the Torque nodes >>>> file? It this a Torque problem? >>>> >>>> p.s. I do get the desired behavior when I run *without* Torque and specify >>>> the following machine file to mpirun: >>>> >>>> n0000 slots=2 >>>> n0001 slots=2 >>>> n0002 slots=8 >>>> n0003 slots=8 >>>> n0004 slots=2 >>>> n0005 slots=2 >>>> n0006 slots=2 >>>> n0007 slots=4 >>>> >>>> Regards, >>>> >>>> Jason >>>> >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users