On Tue, 2002-02-19 at 13:02, Arpi wrote:
> Hi,
> 
> > I think what would be a lot more interesting is spamd in C or C++.  The
> > major benefit I can think of of going to C is performance (though I'm
> > not necessarily convinced you'll beat perl for doing text processing),
> > and if performance is what you care about, you'll be wanting to use
> > spamd anyway, not spamassassin.
> 
> Nope.
> For the perl version, spamd+spamc solution (i would call it a messy
> hack) is a workaround for perl's 'booting/startup' overload.
> For the C version, there is no such overload, by storing precompiled regexps
> in file (or compiled-in data) i can reduce 'booting' time to <5ms.
> (currenty it's ~12ms on p4).

It's really not so messy of a hack, and it's designed for a couple
purposes:

1. work around perl's 'booting/startup' overload (though this would be
more easily done with dump on platforms supporting that)
2. Reduce slow I/O on the machine by reading global config files just
once from disk, then forking.  This relies on the OS doing copy-on-write
stuff for the memory pages in the forked process, but most OSes these
days do that.  Otherwise you probably lose the I/O advantage when you
copy the process' memory space on fork.
3. Allow for far greater loads than will fit on a single mail processing
machine (regardless of how many CPUs you cram in your starfire box) by
enabling the processing load to be spread around a network.  The network
I/O overhead is not all that significant, and if you're running
spamc/spamd on the same machine, communicating over the local loopback
TCP interface, your OS is responsible for making sure that's done
efficiently.  If it's much slower than using a shell pipe, you need a
new OS.

> I don't like the idea of sending mails through the (local) network to a
> daemon and then get the results back... it's a really big overload compared
> to C version, while it still much better than restarting the perl version
> for each mail...

Well, you're sending the mails through the shell's pipe.  IPC to a local
process using shell pipes vs. TCP through loopback should be negligeably
performance-different, or you need to get a new OS (or patch the one you
have)

> > But writing a forking server in C is fairly tricky, at least to get it
> > right.  Particularly if you want to make it reasonably cross-platform. 
> > And not have big security holes.
> >
> > I think it'd be interesting to look at if you had it fully working on at
> > least one big platform, completely implementing the SPAMD network
> > interface.  And then only if it showed better performance.

> I don't even plan to implement 100% compatible alternative, i'll probably
> leave some checks out. At least now i won't implement all that eval's in C,
> and will leave all network tests (they can be done by the MTA if needed).

Well, they can't really be done by the MTA in most cases, unless you
have a really fancy MTA.  The network checks are not done in most cases
against the envelope contents (which is what MTAs normally check), but
against the email header information.  Also, they do things like razor
checking, etc. which MTAs don't normally do.  I'm not saying any of this
is critical, and I run spamd -L anyway.

> Anyway I have some idea to speed up this thingie even more.
> For example, doing the regexp checks at 2-3 passes. First the cheap (fast)
> ones, and the big (high score) ones. Then if score <=0 i'll stop and return
> NO_SPAM. If there was any positive score at first pass, i'll continue with
> expensive (slower) checks. It will probably lower check quality a little bit
> (more possible false report) but increase performance a lot (n times).

This is probably not a bad idea.  It would be useful to share work here
probably and backport this to the perl versions.

> For this i have to do some statistics analyzing of negative/positive hits on
> a big enough spam and nospam collection. It will also slow the less used
> checks, they may be left out, or at least moved to 2nd/3rd pass...

You should have the corpus access instructions now :)

> So, again, the goals are a bit different. I don't plan to do 1:1 rewrite of
> spamassassin or spamd program!

> Developer of MPlayer, the Movie Player for Linux - http://www.MPlayerHQ.hu

I love mplayer :)

C

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to