2003-09-05T04:41:22 James Stevens:
> Forking for each incomming connection could work out expensive,
> [...]

[...] but not badly so at all on Linux. Other OSes aren't so
swift; I can damn near fork (and context switch) faster than
Solaris:-).

> A more sophisticated variation would be to load the database, open the 
> main socket, then fork (say) 5 child process all running a blocking 
> "accept", one of the child processes (whoever happens to have CPU at the 
> time) will then be given the incomming connection and scan the data.

I've done this, and it works out _magnificently_ for email content
scanning. An email content scanner is sandwiched between two bits of
MTA, so the MTA gets to have the Big Picture control over
concurrency management.

In my code, the master binds, then forks off N children (with quick
naps between, to keep from killing the system), then goes to sleep,
waiting for a child to exit. On healthy OSes the children just jump
right into accept on the sockets, letting the OS dispatch
connections to children as it wishes. On sick, sick platforms this
produces errors, so the children dispatch off a semaphore so that
only one child is attempting to accept at a time.

> The child could then either die (and be re-started by the master) or go 
> into another blocking "accept". You could then allow the child to scan, 
> say 10 jobs, before it dies (and is re-started by the master). This is 
> basically how Apache works.

I had a configurable minjobs and maxjobs (defaults 100 and 200,
sounds like clamd might want to start a little lower:-), and each
child rolled a random number uniformly between those two and
serviced just that many before exiting; this schmeared the child
exits out over plenty of time so the master didn't suddenly find all
its children gone and service stalled until it could re-fork 'em.

> Apache is slight more sophisticated still, [...]

Indeed, but it's solving a harder problem, adapting gracefully to
the fractally chaotic load a public webserver gets. An email content
analyzer can be presented a far, far better conditioned load by its
surrounding MTA.

> The master should also have a SIGALRM back stop, so that if it
> locks up, it dies. The master would then be run through inittab so
> that it is always immediately re-started.

That far I don't go; if a simple networking parent can't remain
stable and alive, I'll hunt it down and fix it. Or delete it.

This reminds me of djb's daemontools, where absolutely rock-solid
daemons like dnscache and tinydns are run under a respawner that's
run under an init-replacement respawner that's run under init to
make sure it's respawned as necessary.... Thanks anyway, I run my
djbdns components out of init scripts:-).

> This would give a really bullet proof scanning service and allow
> for a reasonable level of leaking / bugs in the scanning process
> itself.

Arranging to have the mime-hacker process a bounded number of jobs
before exiting, and having crashes in it not deny the whole service,
is definitely appropriate; MIME parsing is an impossible job to do
completely correctly, and is a fiendishly difficult job to do even
usefully competantly. MIME is blecherous.

I'm less excited by massive efforts to carefully arrange for the
networking parent to be supervised and monitored and restarted if
necessary, and for the superviser that monitors that process to be
so monitored, etc. If the parent process that bound the socket and
forks the children should die, my MTA monitoring will set off alarms
(that's only one of a class of possible environmental problems that
could give it constipation), and I'll figure out what happened and
fix it.

Oh, and about my code, if anybody wants it for anything you're
welcome to it, <URL:http://bent.latency.net/smtpprox/>, but as it's
an SMTP proxy written in perl, it probably isn't directly useful to
clamd developers, the above description likely has all the goodie
you'd be able to get out of the perl.

-Bennett

Attachment: pgp00000.pgp
Description: PGP signature

Reply via email to