Hi, > Am 19.04.2017 um 21:52 schrieb flowers-gridus...@hagsc.org: > > Some of the jobs I run essentially just copy stuff to/from a remote > filesystem, and as such more or less max out the available network > bandwidth. This in itself is not a problem, but it can be if more than > one such job runs on the same system, or if too many are running > in total. > > The solution seemed straightforward - create a complex value that would > limit per machine and per queue usage of high io jobs, and have those > jobs request the resource. > > First, the complex value itself: > qconf -sc: > #name shortcut type relop requestable consumable default urgency > high_io io INT <= YES JOB 0 100
Are these parallel jobs? Otherwise "consumable YES" could do it too. > Per queue limit: > qconf -sq all.q: > complex_values high_io=10 This limit works in a different way as you expect: it will limit the high_io complex to 10 per queue instance, i.e. per exechost. In case you have just one queue, it's essentially the same as defining it on an exechost level, just a matter of taste. In case you have more than one queue per exechost, then it can be useful to split the consumption of a certain resource to e.g. 4 in one queue, 8 in the other queue and in addition to 10 per exechost in total. To have a limit in the queue, you can either define it on a global level (which would be across all queues of course): $ qconf -me global … complex_values high_io=10 or as an RQS (resource quota set): $ qconf -srqs io { name io description Limit bandwidth starts enabled TRUE limit name bandwidth queues all.q to high_io=10 } (note that the consumable must be attached somewhere to be consumed from for the RQS to work, most likely you have it on the exechost defined already). Then you should get an output like: $ qquota resource quota rule limit filter -------------------------------------------------------------------------------- io/bandwidth high_io=7/10 queues all.q > Per machine limit: > qconf -se pc65-gsc: > complex_values high_io=1 > > Submitting processes to use the resource: > qsub -l high_io=1 -q all.q do_thing > > Grid will track the resource: > qstat -F high_io: > al...@pc65-gsc.haib.org BIP 0/16/16 5.25 linux-x64 > hc:high_io=-4 > 317388 0.50887 Pe9f1e5cf2 flowers r 04/13/2017 12:59:31 2 > 317389 0.50887 P2133afabd flowers r 04/13/2017 12:59:31 2 > 317390 0.50887 Pae6a146a5 flowers r 04/13/2017 12:59:31 2 > 317391 0.50887 P05685178e flowers r 04/13/2017 12:59:31 2 > 317392 0.50887 P16fd5e5ae flowers r 04/13/2017 12:59:31 2 > > (I don't know how to show queue-level resource consumption.) All limits (global, queue, exechost) are active at the same time. The tightest one is used as the overall limit and this is reflected by the prefix: hc:high_io=-4 hc = host consumable qc = queue consumable gc = global consumable (`man qhost`), the -F option can also be used for `qhost`. > As you can see, grid does not limit high_io resource usage (neither by > machine, as you can see here, or by queue, as I've had 65 high_io=1 > jobs running at once in all.q). I'm assuming I missed some part of > using consumables? I thought the point was that the value couldn't go > below zero (or, rather, that grid would not schedule a job in such a > way that the value would go negative). There were some issues when the complex could get negative, but I don't recall the exact circumstances right now (besides creating the limit when something consuming it is running already). First approach would to make it a "consumable YES" only complex. -- Reuti
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users