Daniel Pittman wrote:

DP> For the first, *nothing* that you do is likely to improve things much
DP> other than rewriting the rules themselves; this can be done equally well
DP> with Perl.

Rule optimization is proceeding.  You might find a better/faster regex engine,
but you'll probably have to re-optimize the rules for that engine vs the perl
engine.  I think we're going to be focussing on optimizing the rules fore the
perl engine, which could improve things a lot from where they are now (things
like prepending \b's where appropriate, etc).

DP> For the second, rewriting SpamAssassin to use a streaming, single pass
DP> algorithm would be (ahem) challenging. I don't suggest it. :)

I've been tossing this idea around some.  I think it bears thinking about, even
though it is going to be (ahem) challenging.  Maybe not do just a single pass,
but at least do a lot *fewer* passes.

DP> > If nothing is done in that. I am ready to help in such project. If
DP> > anybody is interested please mail me , and lets start.
DP>
DP> Check the mailing list archives.

I'm planning on picking up the C stuff which was mentioned here before and
taking a look at it.  I think the disadvantage the C code is is definitely
portability, and also flexibility in terms of the non-regex rules.  You not only
have to write the scanner parts of SA, but also the EvalTests stuff, and also
all the network tests, etc, etc.

DP> > If I am completely out of point and what I am thinking of will not
DP> > help in getting better performance. Please tell me :)
DP>
DP> It's not likely to because you have not clearly identified where the
DP> performance problems lie.
DP>
DP> Every time you fork a process, you pay a huge cost. Avoiding that would
DP> improve your throughput dramatically. Using spam[cd] you pay at least
DP> *three* forks per message, at best, and probably closer to five.

Why 3 forks?  You'll have to fork spamc, and spamd forks (but probably will be a
cheap fork cos it's fork only, not fork-and-exec).  I count 2 there.

DP> Fix that first, if you want to fix anything. Grab, or write, a version
DP> of spamproxyd that you trust[1] with your email, then have inbound SMTP
DP> talk directly to that and have it relay on to the real MTA.

Yes, that is definitely the way to go with high-volume systems.  You want to get
as tightly bound into your MTA as possible, ideally with a milter-like thing, or
using something like a spamproxyd.

DP> That should give you one or, with extra work, less than one fork per
DP> message. That is the best way to improve your performance.

Less than one fork per message seems unlikely, unless you get really clever.
And as far as spamd->spamd child forks are concerned, those should be really,
really cheap, so avoiding them probably won't gain you much (assuming a
non-naive OS fork implementation where it's going to do copy-on-write for the
process memory space).

DP> Oh, and consider SMP hardware in this; SA testing is something that
DP> should give you as close to linear performance improvement as you will
DP> ever see in the real world -- for exactly the same reason that using
DP> threads is a bad idea. :)

Yup, and even consider running spamd on multiple boxes, with spamc switching
between them.

C


_______________________________________________________________

Have big pipes? SourceForge.net is looking for download mirrors. We supply
the hardware. You get the recognition. Email Us: [EMAIL PROTECTED]
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to