Re: [gridengine users] qsub and reservation

Reuti Thu, 09 Mar 2017 09:55:35 -0800

> Am 09.03.2017 um 17:41 schrieb Roberto Nunnari <roberto.nunn...@supsi.ch>:
> 
> On 09.03.2017 15:14, Reuti wrote:
>> Hi,
>> 
>>> Am 09.03.2017 um 14:24 schrieb Roberto Nunnari <roberto.nunn...@supsi.ch>:
>>> 
>>> Hi Reuti.
>>> Hi William.
>>> 
>>> here's my settings you required:
>>> params                            MONITOR=1
>>> max_reservation                   32
>>> default_duration                  0:10:0
>>> 
>>> I cannot understand how What I see in 
>>> ${SGE_ROOT}/${SGE_CELL}/common/schedule can help me.. here's a little 
>>> extract for a job submitted with -R y, and it keeps repeating without change
>>> ...
>>> 3653372:1:RESERVING:1489043424:660:P:smp:slots:32.000000
>>> 3653372:1:RESERVING:1489043424:660:Q:long.q@node19.cluster:slots:32.000000
>>> 3653372:1:RESERVING:1489043424:660:P:smp:slots:32.000000
>>> 3653372:1:RESERVING:1489043424:660:Q:long.q@node19.cluster:slots:32.000000
>> 
>> What else is running in the cluster? Are there other jobs blocked which 
>> would otherwise slip in? All request -l h_rt=…?
> 
> Hi.
> 
> There are always smaller jobs (without -R y) pending in the queue that get in 
> front of bigger jobs (with -R y).
> The user of this big job doesn't make use of options like h_rt, mem_free, 
> etc.. but only asks for a particular node, ie: hostname=node19.cluster


So essentially the node19 should get drained over time.


> 
>> 
>> (When no job requests -l h_rt=… and only the default length apply [which 
>> won't be enforced], SGE might look for another node to make the reservation.)
> 
> the other users usually use -l h_rt=.. and mem_free=.. and as they are serial 
> jobs or parallel jobs that asks less resources, they slip in front of the job 
> that asks more resources even if it was submitted long before and makes use 
> of -R y.

What you can see of course is the possible back-filling of node19. Can you 
check the requested h_rt requests for the other jobs already running on node19? 
As long as the longest job on this node will run, shorter jobs can be filled in 
in case their runtime is lower than this longest job will continue to run.


> 
> One more question. how can I understand that something is moving with 
> reservation (ie see that the scheduler has started reserving slots) by 
> looking in the file ${SGE_ROOT}/${SGE_CELL}/common/schedule ?

When you request a special node, the reservation can't move to another node. I 
saw this only in case the job with -R y may freely be scheduled inside the 
cluster and the already running jobs have no h_rt (hence the default_runtime 
applies) and they run much longer than anticipated, so that the reservation at 
one point can be fulfilled sooner when it moves to a another node.

-- Reuti

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] qsub and reservation

Reply via email to