On Tue, 2002-02-19 at 13:02, Arpi wrote: > Hi, > > > I think what would be a lot more interesting is spamd in C or C++. The > > major benefit I can think of of going to C is performance (though I'm > > not necessarily convinced you'll beat perl for doing text processing), > > and if performance is what you care about, you'll be wanting to use > > spamd anyway, not spamassassin. > > Nope. > For the perl version, spamd+spamc solution (i would call it a messy > hack) is a workaround for perl's 'booting/startup' overload. > For the C version, there is no such overload, by storing precompiled regexps > in file (or compiled-in data) i can reduce 'booting' time to <5ms. > (currenty it's ~12ms on p4).
It's really not so messy of a hack, and it's designed for a couple purposes: 1. work around perl's 'booting/startup' overload (though this would be more easily done with dump on platforms supporting that) 2. Reduce slow I/O on the machine by reading global config files just once from disk, then forking. This relies on the OS doing copy-on-write stuff for the memory pages in the forked process, but most OSes these days do that. Otherwise you probably lose the I/O advantage when you copy the process' memory space on fork. 3. Allow for far greater loads than will fit on a single mail processing machine (regardless of how many CPUs you cram in your starfire box) by enabling the processing load to be spread around a network. The network I/O overhead is not all that significant, and if you're running spamc/spamd on the same machine, communicating over the local loopback TCP interface, your OS is responsible for making sure that's done efficiently. If it's much slower than using a shell pipe, you need a new OS. > I don't like the idea of sending mails through the (local) network to a > daemon and then get the results back... it's a really big overload compared > to C version, while it still much better than restarting the perl version > for each mail... Well, you're sending the mails through the shell's pipe. IPC to a local process using shell pipes vs. TCP through loopback should be negligeably performance-different, or you need to get a new OS (or patch the one you have) > > But writing a forking server in C is fairly tricky, at least to get it > > right. Particularly if you want to make it reasonably cross-platform. > > And not have big security holes. > > > > I think it'd be interesting to look at if you had it fully working on at > > least one big platform, completely implementing the SPAMD network > > interface. And then only if it showed better performance. > I don't even plan to implement 100% compatible alternative, i'll probably > leave some checks out. At least now i won't implement all that eval's in C, > and will leave all network tests (they can be done by the MTA if needed). Well, they can't really be done by the MTA in most cases, unless you have a really fancy MTA. The network checks are not done in most cases against the envelope contents (which is what MTAs normally check), but against the email header information. Also, they do things like razor checking, etc. which MTAs don't normally do. I'm not saying any of this is critical, and I run spamd -L anyway. > Anyway I have some idea to speed up this thingie even more. > For example, doing the regexp checks at 2-3 passes. First the cheap (fast) > ones, and the big (high score) ones. Then if score <=0 i'll stop and return > NO_SPAM. If there was any positive score at first pass, i'll continue with > expensive (slower) checks. It will probably lower check quality a little bit > (more possible false report) but increase performance a lot (n times). This is probably not a bad idea. It would be useful to share work here probably and backport this to the perl versions. > For this i have to do some statistics analyzing of negative/positive hits on > a big enough spam and nospam collection. It will also slow the less used > checks, they may be left out, or at least moved to 2nd/3rd pass... You should have the corpus access instructions now :) > So, again, the goals are a bit different. I don't plan to do 1:1 rewrite of > spamassassin or spamd program! > Developer of MPlayer, the Movie Player for Linux - http://www.MPlayerHQ.hu I love mplayer :) C _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk