tags 332285 + fixed-upstream
thanks

On Mon, 10 Mar 2008 12:01:09 +0100, Petter Reinholdtsen <[EMAIL PROTECTED]> 
said:

> The consequence on a machine with insufficient resources, is that
> munin will spend all resources and bring the machine to complete
> standstill.  The first job starts, then the next job starts before
> the first job is finished and slow down the machine even more, and
> then the third job starts before the second and perhaps also the
> first job is finished, and so on.

What should have happened was:

munin-cron(1) starts, and runs a sequence of munin components
munin-update, munin-limits, munin-graph, munin-html

munin-cron(2) starts 5 minutes later, and runs the same sequence.  If
any of the components is already running, munin-cron will just try to
run the next sequence.

If (an extreme example) each component use more than 5 minutes to run,
you should have no more than 3 munin-cron processes each running one
component.  (munin-limits is very quick, so I don't count that one)

What you are experiencing sounds like an issue with the locking system
not working as intended, rather than there not being a locking system,
since it looks like it ignores the locks and runs the munin components
in parallel.

By the way: Are you still running 1.2.3-1, or have you upgraded to
1.2.5-2 since filing the bug?  I've yet to experience this on an etch
install, which has 1.2.5, but I seem to remember seeing this in the
past on servers with very high load.

> This is the failure mode I suggest to implement a guard against.
> Yes, in any case munin will not work properly, but at least it will
> be possible to try to fix it. :)

I'll mark this bug as "fixed-upstream", since the whole locking system
was rewritten some time ago in the 1.3 branch, and get munin 1.3.4 out
as soon as possible.

-- 
Stig Sandbeck Mathisen, Linpro



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to