(I originally found this problem on a very busy FreeBSD 4.3 system
running with the bigtodo patch - it is much less likely to occur with
a standard qmail.)
First some background regarding trigger. When qmail-queue has a mail
for qmail-send it opens the named pipe, trigger, writes a byte to it,
closes trigger and exits.
qmail-send notices this trigger in the following loop:
open trigger
select: is trigger readable?
...
todo_do()
...
close trigger
open trigger
...
select: is trigger readable
etc.
A couple of notes on this loop:
o The todo_do() involves a potentially expensive directory scan - if
lots of injections are occuring or if you use the bigtodo patch.
o The idea behind closing and opening trigger is to flush the byte
written by qmail-queue so that next time around the loop the select
blocks until another qmail-queue comes along.
The problem I've found relates to when the flush occurs on a named
pipe. At least on FreeBSD, a named pipe is only flushed when no other
process has the pipe opened.
On a very busy system the chance of this occuring reduces as there is
almost always one or more qmail-queue processes running. Futhermore
the code order of qmail-send is such that the window in which no
qmail-queue process can exist is very very small. It's the tiny window
between the close that immediately precedes the open in trigger_set().
The degenerate case I see is that qmail-send starts spinning on the
select()--todo_do() loop as select() always indicates that the trigger
is readable. This spin involves a directory scan of todo which slows
the qmail-queue processes as they too are writing to the same
directory/file system. Since the qmail-queue processes are further
slowed, qmail-send continues to spin on a readable trigger.
In other words, in the tiny window that qmail-send leaves for the
kernel to flush the pipe, there is always at least one qmail-queue
process with the trigger open. Ergo a resource burning spin that
degenerates if the injection rate is high and regular (exactly the
situation for the servers I noticed this on).
Returning to the bigtodo patch, that of course exacerbates the
situation as the window between the close and open in trigger_set
forms an even smaller part of the loop.
Fortunately there are a couple of remedies.
At the very least, the flush window can be made substantially larger
by closing trigger as soon as the select returns.
A second and more defensive measure is to issue a non-blocking read on
the pipe to drain all qmail-queue bytes *prior* to the todo
scan. Perhaps both of these could be done in the trigger_pull
routine. I've appended a patch that gives the idea in code (it's
untested).
Question: has anyone else seen this? You most likely will only see it
on a very busy system that has bigtodo.
Regards.
*** trigger.orig.c Mon Jun 15 03:53:16 1998
--- trigger.c Wed Jul 25 16:50:40 2001
***************
*** 1,4 ****
--- 1,5 ----
#include "select.h"
+ #include "ndelay.h"
#include "open.h"
#include "trigger.h"
#include "hasnpbg1.h"
***************
*** 36,41 ****
int trigger_pulled(rfds)
fd_set *rfds;
{
! if (fd != -1) if (FD_ISSET(fd,rfds)) return 1;
return 0;
}
--- 37,55 ----
int trigger_pulled(rfds)
fd_set *rfds;
{
! char buf[64];
!
! if ((fd != -1) && FD_ISSET(fd,rfds))
! {
! ndelay_on(fd);
! while (read(fd,buf,sizeof(buf)) > 0) ;
! close(fd);
! fd = -1;
! #ifdef HASNAMEDPIPEBUG1
! if (fdw != -1)
! close(fdw);
! #endif
! return 1;
! }
return 0;
}