Daniel Pittman wrote: DP> For the first, *nothing* that you do is likely to improve things much DP> other than rewriting the rules themselves; this can be done equally well DP> with Perl.
Rule optimization is proceeding. You might find a better/faster regex engine, but you'll probably have to re-optimize the rules for that engine vs the perl engine. I think we're going to be focussing on optimizing the rules fore the perl engine, which could improve things a lot from where they are now (things like prepending \b's where appropriate, etc). DP> For the second, rewriting SpamAssassin to use a streaming, single pass DP> algorithm would be (ahem) challenging. I don't suggest it. :) I've been tossing this idea around some. I think it bears thinking about, even though it is going to be (ahem) challenging. Maybe not do just a single pass, but at least do a lot *fewer* passes. DP> > If nothing is done in that. I am ready to help in such project. If DP> > anybody is interested please mail me , and lets start. DP> DP> Check the mailing list archives. I'm planning on picking up the C stuff which was mentioned here before and taking a look at it. I think the disadvantage the C code is is definitely portability, and also flexibility in terms of the non-regex rules. You not only have to write the scanner parts of SA, but also the EvalTests stuff, and also all the network tests, etc, etc. DP> > If I am completely out of point and what I am thinking of will not DP> > help in getting better performance. Please tell me :) DP> DP> It's not likely to because you have not clearly identified where the DP> performance problems lie. DP> DP> Every time you fork a process, you pay a huge cost. Avoiding that would DP> improve your throughput dramatically. Using spam[cd] you pay at least DP> *three* forks per message, at best, and probably closer to five. Why 3 forks? You'll have to fork spamc, and spamd forks (but probably will be a cheap fork cos it's fork only, not fork-and-exec). I count 2 there. DP> Fix that first, if you want to fix anything. Grab, or write, a version DP> of spamproxyd that you trust[1] with your email, then have inbound SMTP DP> talk directly to that and have it relay on to the real MTA. Yes, that is definitely the way to go with high-volume systems. You want to get as tightly bound into your MTA as possible, ideally with a milter-like thing, or using something like a spamproxyd. DP> That should give you one or, with extra work, less than one fork per DP> message. That is the best way to improve your performance. Less than one fork per message seems unlikely, unless you get really clever. And as far as spamd->spamd child forks are concerned, those should be really, really cheap, so avoiding them probably won't gain you much (assuming a non-naive OS fork implementation where it's going to do copy-on-write for the process memory space). DP> Oh, and consider SMP hardware in this; SA testing is something that DP> should give you as close to linear performance improvement as you will DP> ever see in the real world -- for exactly the same reason that using DP> threads is a bad idea. :) Yup, and even consider running spamd on multiple boxes, with spamc switching between them. C _______________________________________________________________ Have big pipes? SourceForge.net is looking for download mirrors. We supply the hardware. You get the recognition. Email Us: [EMAIL PROTECTED] _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk