Hi, Am 26.12.2011 um 17:55 schrieb Santosh Ansumali:
> Dear Dr. Correa, > Sorry for my ignorance on cluster maintenance. So far our > cluster is just set-up by a vendor and we do not know more details. > So far I am understanding the concept but we are not able to follow > what precisely we need to try out for allowing oversubscription. > In this submission file > my current submission file is written as follows > #!/bin/bash > #$ -N first > #$ -S /bin/bash > #$ -cwd > #$ -e $JOB_ID.$JOB_NAME.ERROR > #$ -o $JOB_ID.$JOB_NAME.OUTPUT > #$ -P faculty_prj > #$ -p 0 > #$ -pe orte 8 > /opt/mpi/openmpi/1.3.3/gnu/bin/mpirun -np $NSLOTS ./test_vel.out if it's a common question regarding the queuing system, you can also turn to the SGE list http://gridengine.org/blog/2011/01/27/gridengine-users-mailing-list/ If you have no contract with the vendor, you will need someone in charge for it at your site and gets familiar with SGE administration. $ man queue_conf # Have a look what can be defined in a queue. $ qconf -sql # Shows what queues are defined. $ qconf -mq all.q # replace all.q with the one you found above and edit the slot count. $ man sge_pe # Check the options for the PE. $ qconf -spl # Shows what PEs are defined. $ qconf -mp orte # Check the allocation rule, what's there? Then change in the job script the 8 to the number you used above. -- Reuti > what change we should do to allow for oversubscription. > Best, > Santosh > > > > On Mon, Dec 26, 2011 at 9:02 PM, Reuti <re...@staff.uni-marburg.de> wrote: >> Am 23.12.2011 um 21:16 schrieb Gustavo Correa: >> >>> I don't know about the grid engine/ SGE. >>> However, in Torque, the batch/resource manager I use, >>> to allow oversubscription, you need to modify the batch server nodes file >>> and pretend the nodes have more cores than the physical ones. >>> [Something like 'node01 np=8' would change to 'node01 np=16' for instance.] >>> Maybe there is something similar in SGE. >> >> Yep, it's in the queue definition, where you can define the slots per queue >> instance on each machine. >> >> Depending on your setup: if you have more than one queue per machine, the >> admin might already have set up some RQS (Resource Quota Set) or an absolut >> limit of slots across all queues residing on a host in teh exechost >> definition. In this case this needs to be adjusted too. >> >> -- Reuti >> >> >>> We had bad results [program hanging or aborting] >>> when trying to run large programs which include PDE solvers >>> [climate models] and allowing oversubscription, even when a substantial >>> amount >>> of RAM was idle. >>> That was a while ago, and I have not pursued the issue any further. >>> Maybe context switching among the [surplus of] processes is the problem. >>> Of course for 'hello, world' type of programs oversubscription works well. >>> Where is the threshold when oversubscription makes a program break down, >>> I'd guess only trial and error may tell. >>> >>> I hope this helps, >>> Gus Correa >>> >>> On Dec 23, 2011, at 2:42 PM, Santosh Ansumali wrote: >>> >>>> Dear All, >>>> We are running a PDE solver which is memory bound. Due to >>>> cache related issue, smaller number of grid point per core leads to >>>> better performance for this code. Thus, though available memory per >>>> core is more than 2 GB, we are able to good performance by using >>>> less than 1 GB per core. >>>> >>>> I want to know whether oversubscribing the cores can potentially >>>> improve performance of such a code. My thinking is that if I >>>> oversubscribe the cores, each thread will be using less than 1 GB so >>>> cache related problems will be less severe. Is this logic correct or >>>> due to cache conflict performance will deteriorate further? >>>> In case, over-subscription can help, how shall I modify >>>> submission file (using sun grid engine) to enable over-subscription of >>>> cores? >>>> my current submission file is written as follows >>>> #!/bin/bash >>>> #$ -N first >>>> #$ -S /bin/bash >>>> #$ -cwd >>>> #$ -e $JOB_ID.$JOB_NAME.ERROR >>>> #$ -o $JOB_ID.$JOB_NAME.OUTPUT >>>> #$ -P faculty_prj >>>> #$ -p 0 >>>> #$ -pe orte 8 >>>> /opt/mpi/openmpi/1.3.3/gnu/bin/mpirun -np $NSLOTS ./test_vel.out >>>> >>>> Is it possible to allow over-subscription by modifying submission file >>>> itself? Or do I need to change hostfiles somehow? >>>> Thanks for your help! >>>> Best Regards >>>> Santosh Ansumali, >>>> Faculty Fellow, >>>> Engineering Mechanics Unit >>>> Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR) >>>> Jakkur, Bangalore-560 064, India >>>> Tel: + 91 80 22082938 >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -- > Santosh Ansumali, > Faculty Fellow, > Engineering Mechanics Unit > Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR) > Jakkur, Bangalore-560 064, India > Tel: + 91 80 22082938 > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >