Re: [gridengine users] qsub and reservation

Roberto Nunnari Thu, 09 Mar 2017 10:32:26 -0800


On 09.03.2017 18:52, Reuti wrote:

Am 09.03.2017 um 17:41 schrieb Roberto Nunnari <roberto.nunn...@supsi.ch>:

On 09.03.2017 15:14, Reuti wrote:

Hi,

Am 09.03.2017 um 14:24 schrieb Roberto Nunnari <roberto.nunn...@supsi.ch>:

Hi Reuti.
Hi William.

here's my settings you required:
params                            MONITOR=1
max_reservation                   32
default_duration                  0:10:0

I cannot understand how What I see in ${SGE_ROOT}/${SGE_CELL}/common/schedule 
can help me.. here's a little extract for a job submitted with -R y, and it 
keeps repeating without change
...
3653372:1:RESERVING:1489043424:660:P:smp:slots:32.000000
3653372:1:RESERVING:1489043424:660:Q:long.q@node19.cluster:slots:32.000000
3653372:1:RESERVING:1489043424:660:P:smp:slots:32.000000
3653372:1:RESERVING:1489043424:660:Q:long.q@node19.cluster:slots:32.000000


What else is running in the cluster? Are there other jobs blocked which would 
otherwise slip in? All request -l h_rt=…?


Hi.

There are always smaller jobs (without -R y) pending in the queue that get in 
front of bigger jobs (with -R y).
The user of this big job doesn't make use of options like h_rt, mem_free, etc.. 
but only asks for a particular node, ie: hostname=node19.cluster


So essentially the node19 should get drained over time.

Yes, I expect that over time slots on node19 will be reserved for thejob requesting reservation, as they become free when jobs running onnode19 exit.


(When no job requests -l h_rt=… and only the default length apply [which won't 
be enforced], SGE might look for another node to make the reservation.)


the other users usually use -l h_rt=.. and mem_free=.. and as they are serial 
jobs or parallel jobs that asks less resources, they slip in front of the job 
that asks more resources even if it was submitted long before and makes use of 
-R y.


What you can see of course is the possible back-filling of node19. Can you 
check the requested h_rt requests for the other jobs already running on node19? 
As long as the longest job on this node will run, shorter jobs can be filled in 
in case their runtime is lower than this longest job will continue to run.


One more question. how can I understand that something is moving with 
reservation (ie see that the scheduler has started reserving slots) by looking 
in the file ${SGE_ROOT}/${SGE_CELL}/common/schedule ?


When you request a special node, the reservation can't move to another node. I 
saw this only in case the job with -R y may freely be scheduled inside the 
cluster and the already running jobs have no h_rt (hence the default_runtime 
applies) and they run much longer than anticipated, so that the reservation at 
one point can be fulfilled sooner when it moves to a another node.

I don't mean move from node to node.. by moving I mean that somethinghappens in the scheduler.. that the scheduler reserves a slot for thepending job requesting reservation.. in the schedule file, I see onlylines with the word RESERVING.. and never something like RESERVED.. orlittle changes that tell me that something is changing.. I always seelines like these:

3653372:1:RESERVING:1489043424:660:P:smp:slots:32.000000
3653372:1:RESERVING:1489043424:660:Q:long.q@node19.cluster:slots:32.000000

I believe that if the scheduler reserves a slot, something in theselines should change..


Thank you.

--
Roberto Nunnari
Servizi Informatici Ti-Edu
Via Pobiette 11 - 6928 Manno - Switzerland
helpdesk email: mailto: h...@ti-edu.ch
direct email: mailto:roberto.nunn...@supsi.ch
tel: +41-58-6666561
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] qsub and reservation

Reply via email to