Hi,

> Am 19.04.2017 um 21:52 schrieb flowers-gridus...@hagsc.org:
> 
> Some of the jobs I run essentially just copy stuff to/from a remote
> filesystem, and as such more or less max out the available network
> bandwidth.  This in itself is not a problem, but it can be if more than
> one such job runs on the same system, or if too many are running
> in total.
> 
> The solution seemed straightforward - create a complex value that would
> limit per machine and per queue usage of high io jobs, and have those
> jobs request the resource.
> 
> First, the complex value itself:
> qconf -sc:
> #name   shortcut type relop requestable consumable default urgency
> high_io io       INT  <=    YES         JOB        0       100

Are these parallel jobs? Otherwise "consumable YES" could do it too.


> Per queue limit:
> qconf -sq all.q:
> complex_values high_io=10

This limit works in a different way as you expect: it will limit the high_io 
complex to 10 per queue instance, i.e. per exechost. In case you have just one 
queue, it's essentially the same as defining it on an exechost level, just a 
matter of taste. In case you have more than one queue per exechost, then it can 
be useful to split the consumption of a certain resource to e.g. 4 in one 
queue, 8 in the other queue and in addition to 10 per exechost in total.

To have a limit in the queue, you can either define it on a global level (which 
would be across all queues of course):

$ qconf -me global
…
complex_values    high_io=10

or as an RQS (resource quota set):

$ qconf -srqs io
{
   name         io
   description  Limit bandwidth starts
   enabled      TRUE
   limit        name bandwidth queues all.q to high_io=10
}

(note that the consumable must be attached somewhere to be consumed from for 
the RQS to work, most likely you have it on the exechost defined already). Then 
you should get an output like:

$ qquota
resource quota rule limit                filter
--------------------------------------------------------------------------------
io/bandwidth       high_io=7/10         queues all.q


> Per machine limit:
> qconf -se pc65-gsc:
> complex_values high_io=1
> 
> Submitting processes to use the resource:
> qsub -l high_io=1 -q all.q do_thing
> 
> Grid will track the resource:
> qstat -F high_io:
> al...@pc65-gsc.haib.org        BIP   0/16/16        5.25     linux-x64
>        hc:high_io=-4
> 317388 0.50887 Pe9f1e5cf2 flowers      r     04/13/2017 12:59:31     2
> 317389 0.50887 P2133afabd flowers      r     04/13/2017 12:59:31     2
> 317390 0.50887 Pae6a146a5 flowers      r     04/13/2017 12:59:31     2
> 317391 0.50887 P05685178e flowers      r     04/13/2017 12:59:31     2
> 317392 0.50887 P16fd5e5ae flowers      r     04/13/2017 12:59:31     2
> 
> (I don't know how to show queue-level resource consumption.)

All limits (global, queue, exechost) are active at the same time. The tightest 
one is used as the overall limit and this is reflected by the prefix:

hc:high_io=-4

hc = host consumable
qc = queue consumable
gc = global consumable

(`man qhost`), the -F option can also be used for `qhost`.


> As you can see, grid does not limit high_io resource usage (neither by
> machine, as you can see here, or by queue, as I've had 65 high_io=1
> jobs running at once in all.q).  I'm assuming I missed some part of
> using consumables?  I thought the point was that the value couldn't go
> below zero (or, rather, that grid would not schedule a job in such a
> way that the value would go negative).

There were some issues when the complex could get negative, but I don't recall 
the exact circumstances right now (besides creating the limit when something 
consuming it is running already). First approach would to make it a "consumable 
YES" only complex.

-- Reuti

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to