Hello list,
I have a bit of a problem with our job submission.
We have a setup with four different 'priority' queues - very low, low,
medium, and high - with subordination. The setup actually works quite
well for our usage pattern - with the highest priority queue being
reserved for automatic data reduction procedures (which are supposed to
run whenever triggered).
Of late, we have noticed that we do quite often see jobs in the highest
priority queue suspend jobs running in the medium priority queue,
although there are nodes that do not have any medium priority jobs on
them. It's not a big problem, but it is annoying. So, I'm after ideas as
to how to make the number of jobs running in the medium priority queue a
factor in the scheduling decision.
One of the problems is (I suspect) that there are always jobs running in
the very low priority queue - i.e. there is always load on the nodes. I
suppose that might skew the scheduling decision a bit (as it makes load
average misleading). From our point of view, we'd like the scheduler to
basically disregard the load average and focus on how many jobs there
are running on this host already when making a scheduling decision.
I have tried a load sensor - basically counting the number of jobs in
the queue on a machine - but that didn't seem to make a difference;
which might be due to the weighting, I suppose.
Anyone got any bright ideas?
Tina
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail and any attachments may contain confidential, copyright and or
privileged material, and are for the use of the intended addressee only. If you
are not the intended addressee or an authorised recipient of the addressee
please notify us of receipt by returning the e-mail and do not use, copy,
retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and
Wales with its registered office at Diamond House, Harwell Science and
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users