Hi, > > There are many pros and contras to C version, i won't list these, it's on > > your fantasy. > > I can imagine a few of them, but am curious what you are thinking of as > the pros and cons.
ok... pros: - portability (on unix platforms) - much better speed (on my test p3 perl version with spamd/spamc it processes ~3 mails/sec, the c version is about 11 mails/sec, anyway the perl version has more checks so the comparison is not final and fair) note: the parsed, compiled regexps can be saved to binary file and reused for every checks until the ruleset changes... it saves ~15ms/mail (~20% speedup) - much smaller resource requirement (the compiled elf proggy is 30k, ruleset is 100k, it uses ~300kb ram - can't be compared to the overload of perl 5.6) - i don't like the perl language ;) cons: - parallel development (need to keep in sync with perl version) - portability (on non-unix) i'm sure there are others, but the above ones are important for me. a note on portability: yes, i know that perl should be portable, even more than c. it's true in theory. we spent 2 weeks getting perl 5.6 working on rs/6000 running aix 4.2. unsuccessfull. perl compiled, but one of its self-tests failed... i'm sure there is someone who can hack it working, but i don't like installing tons of cspan stuff along a big not-really-portable perl interpreter to a big traffic mail server. so, my primary goal: make a small but very fast, efficient version to be used on very high traffic mail servers. and, by allowing several instances at the same time make possible to profit from SMP. (afaik spamd only processes a single mail at the same time) for this goal, we'll may use asm optimization, and maybe switch from PCRE to libtre (faster but non back-referencing regexp lib). i'll implement very fast code (using hashed tree-based search) for spam phrases matching. i found it very usefull, many of the spam mails are catched by this. > > Are you interested in such thing in CVS ? > > I'd love to see it. > ok. as soon as i got it in production level (currently it prints lots of debug stuff, and just analyzes mail (count score), doesn't edit headers and give back filtered mail, anyway it isn't really needed for us). yet another question: i've seen in docs some statistics running spamassassin on ~40.000 spam mails and similar amount of non-spam. can i access this spam collection/database? would be usefull for real-life benchmarking. (currently i'm running it on ~1800 spam and ~60000 non-spam mails for tests) A'rpi / Astral & ESP-team -- Developer of MPlayer, the Movie Player for Linux - http://www.MPlayerHQ.hu _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk