On 01-Jun-11 08:39, Benny Lofgren wrote:
On 2011-06-01 17.16, Christiano F. Haesbaert wrote:
On 1 June 2011 11:01, LeviaComm Networks<n...@leviacomm.net>  wrote:
On 01-Jun-11 05:46, Benny Lofgren wrote:

On 2011-05-31 14.45, Artur Grabowski wrote:

The load average is a decaying average of the number of processes in
the runnable state or currently running on a cpu or in the process of
being forked or that have spent less than a second in a sleep state
with sleep priority lower than PZERO, which includes waiting for
memory resources, disk I/O, filesystem locks and a bunch of other
things. You could say it's a very vague estimate of how much work the
cpu might need to be doing soon, maybe. Or it could be completely
wrong because of sampling bias. It's not very important so it's not
really critical for the system to do a good job guessing this number,
so the system doesn't really try too hard.

This number may tell you something useful, or it might be totally
misleading. Or both.

One thing that often bites me in the butt is that cron relies on the
load average to decide if it should let batch(1) jobs run or not.

The default is if cron sees a loadavg>   1.5 it keeps the batch job
enqueued until it drops below that value. As I often see much, much
higher loads on my systems, invariably I find myself wondering why my
batch jobs never finish, just to discover that they have yet to run.
*duh*

So whenever I remember to, on every new system I set up I configure a
different load threshold value for cron. But I tend to forget, so...
:-)

I have no really good suggestion for how else cron should handle this,
otherwise I would have submitted a patch ages ago...


I had tinkered with a solution for this:
Cron wakes up a minute before the batch run is scheduled to run.  Cron will
then copy a random 4kb sector from the hard disk to RAM, then run either an
MD5 or SHA hash against it.  The whole process would be timed and if it
completed within a a reasonable amount of time for the system then it would
kick off a batch job

This was the easiest way I thought of measuring the actual performance of
the system at any given time since it measures the entire system and
closely
emulates actual work.

While this isn't really the right thing to do, I found it to be the most
effective on my systems.



You really think cron should be doing it's own calculation ? I don't
like that *at all*.

Can't we just have a higher default threshold for cron ?
Can't we default to 0 ?

I think this is something that should be looked up, if we admit load
average is a shitty measure, we shouldn't rely on it for running cron
jobs.

I hereby vote for default to 0. (Thank god this isn't a democracy :-) )

I didn't really like Christophers suggestion either.

I don't like it either, but its only way to get my file server to run batch jobs without noticable performance loss

For one thing, *any* kind of attempt at userland performance measurement
will over time (as hardware gets faster) become less accurate to the
point of not being usable unless tuned, and we really DON'T want to have
to tune cron (or anything else userland for that matter) for different
architectures and/or generations of systems.


I never intended my suggestion to be used as-is in open-bsd proper, just as a mentioning of what I am doing to bypass bogus load values preventing my system from doing what it needs

Also, what kind of metric should cron measure? What if the batch job is
CPU-bound only, but will take two weeks to run and it's simply most
convenient to start it using batch(1)? Or if the second batch job is i/o
bound and doesn't get to run because I just started up the two-week CPU
bound job and cron only measures that?

The jobs I run on my file servers require a bit of everything


In fact I really don't feel the load average is such a bad metric for
cron to use, it's just that the default was probably set a millenia
ago and hasn't changed since then.

Easiest is to set the default to 0.0 as you suggest, disabling the
feature altogether, more complicated but perhaps better in this world
of multi-core systems might be to set it = number of cores.

I agree with that, but I've had times where the load was genuinely high and the batch jobs cause the system to slow to the point of being unusable, or at least to the point where the users start complaining about it.


(Which also reminds me, sendmail have a similar feature using load
average, which have also bugged me from time to time. Might be others
as well, but none come to mind right now.)


Regards,
/Benny



--
-Christopher Ahrens

Reply via email to