Hi,

Am 26.12.2011 um 17:55 schrieb Santosh Ansumali:

> Dear Dr. Correa,
>    Sorry for my ignorance on cluster maintenance.  So far  our
> cluster is just set-up by a vendor and we do not know more details.
> So far I am understanding the concept but we are not able to follow
> what precisely we need to try out for allowing oversubscription.
> In this submission file
> my current submission file is written as follows
> #!/bin/bash
> #$ -N first
> #$ -S /bin/bash
> #$ -cwd
> #$ -e $JOB_ID.$JOB_NAME.ERROR
> #$ -o $JOB_ID.$JOB_NAME.OUTPUT
> #$ -P faculty_prj
> #$ -p 0
> #$ -pe orte 8
> /opt/mpi/openmpi/1.3.3/gnu/bin/mpirun -np $NSLOTS ./test_vel.out

if it's a common question regarding the queuing system, you can also turn to 
the SGE list 
http://gridengine.org/blog/2011/01/27/gridengine-users-mailing-list/ If you 
have no contract with the vendor, you will need someone in charge for it at 
your site and gets familiar with SGE administration.

$ man queue_conf # Have a look what can be defined in a queue.

$ qconf -sql # Shows what queues are defined.

$ qconf -mq all.q # replace all.q with the one you found above and edit the 
slot count.

$ man sge_pe # Check the options for the PE.

$ qconf -spl # Shows what PEs are defined.

$ qconf -mp orte # Check the allocation rule, what's there?

Then change in the job script the 8 to the number you used above.

-- Reuti


> what change we should do to allow for oversubscription.
> Best,
> Santosh
> 
> 
> 
> On Mon, Dec 26, 2011 at 9:02 PM, Reuti <re...@staff.uni-marburg.de> wrote:
>> Am 23.12.2011 um 21:16 schrieb Gustavo Correa:
>> 
>>> I don't know about the grid engine/ SGE.
>>> However, in Torque, the batch/resource manager I use,
>>> to allow oversubscription, you need to modify the batch server nodes file
>>> and pretend the nodes have more cores than the physical ones.
>>> [Something like 'node01 np=8' would change to 'node01 np=16' for instance.]
>>> Maybe there is something similar in SGE.
>> 
>> Yep, it's in the queue definition, where you can define the slots per queue 
>> instance on each machine.
>> 
>> Depending on your setup: if you have more than one queue per machine, the 
>> admin might already have set up some RQS (Resource Quota Set) or an absolut 
>> limit of slots across all queues residing on a host in teh exechost 
>> definition. In this case this needs to be adjusted too.
>> 
>> -- Reuti
>> 
>> 
>>> We had bad results [program hanging or aborting]
>>> when trying to run large programs which include PDE solvers
>>> [climate models] and allowing oversubscription, even when a substantial 
>>> amount
>>> of  RAM was idle.
>>> That was a while ago, and I have not pursued the issue any further.
>>> Maybe context switching among the [surplus of] processes is the problem.
>>> Of course for 'hello, world' type of programs oversubscription works well.
>>> Where is the threshold when oversubscription makes a program break down,
>>> I'd guess only trial and error may tell.
>>> 
>>> I hope this helps,
>>> Gus Correa
>>> 
>>> On Dec 23, 2011, at 2:42 PM, Santosh Ansumali wrote:
>>> 
>>>> Dear All,
>>>>       We are running a PDE solver which is memory bound. Due to
>>>> cache related issue,   smaller  number of grid point per core leads to
>>>> better performance for this code.  Thus, though available memory per
>>>> core is more than 2 GB, we are able to good  performance   by using
>>>> less than 1 GB per core.
>>>> 
>>>> I want to know whether oversubscribing the cores can potentially
>>>> improve performance of such a code.  My thinking is that if I
>>>> oversubscribe the cores,  each thread will be using less than 1 GB so
>>>> cache related problems will be less severe.  Is this logic correct or
>>>> due to cache conflict performance will deteriorate further?
>>>>     In case, over-subscription can help, how shall I modify
>>>> submission file (using sun grid engine) to enable over-subscription of
>>>> cores?
>>>> my current submission file is written as follows
>>>> #!/bin/bash
>>>> #$ -N first
>>>> #$ -S /bin/bash
>>>> #$ -cwd
>>>> #$ -e $JOB_ID.$JOB_NAME.ERROR
>>>> #$ -o $JOB_ID.$JOB_NAME.OUTPUT
>>>> #$ -P faculty_prj
>>>> #$ -p 0
>>>> #$ -pe orte 8
>>>> /opt/mpi/openmpi/1.3.3/gnu/bin/mpirun -np $NSLOTS ./test_vel.out
>>>> 
>>>> Is it possible to allow over-subscription by modifying submission file
>>>> itself?  Or do I need to change hostfiles somehow?
>>>> Thanks for your help!
>>>> Best Regards
>>>> Santosh Ansumali,
>>>> Faculty Fellow,
>>>> Engineering Mechanics Unit
>>>> Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR)
>>>> Jakkur, Bangalore-560 064, India
>>>> Tel: + 91 80 22082938
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> -- 
> Santosh Ansumali,
> Faculty Fellow,
> Engineering Mechanics Unit
> Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR)
>  Jakkur, Bangalore-560 064, India
> Tel: + 91 80 22082938
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 


Reply via email to