Hi,

> > There are many pros and contras to C version, i won't list these, it's on
> > your fantasy.
> 
> I can imagine a few of them, but am curious what you are thinking of as
> the pros and cons.

ok...
pros:
- portability (on unix platforms)
- much better speed (on my test p3 perl version with spamd/spamc it
processes ~3 mails/sec, the c version is about 11 mails/sec, anyway
the perl version has more checks so the comparison is not final and fair)
note: the parsed, compiled regexps can be saved to binary file and reused
for every checks until the ruleset changes... it saves ~15ms/mail (~20% speedup)
- much smaller resource requirement (the compiled elf proggy is 30k, ruleset
is 100k, it uses ~300kb ram - can't be compared to the overload of perl 5.6)
- i don't like the perl language ;)

cons:
- parallel development (need to keep in sync with perl version)
- portability (on non-unix)

i'm sure there are others, but the above ones are important for me.
a note on portability: yes, i know that perl should be portable, even more
than c. it's true in theory. we spent 2 weeks getting perl 5.6 working on
rs/6000 running aix 4.2. unsuccessfull. perl compiled, but one of its
self-tests failed... i'm sure there is someone who can hack it working, but
i don't like installing tons of cspan stuff along a big not-really-portable
perl interpreter to a big traffic mail server.

so, my primary goal: make a small but very fast, efficient version to be
used on very high traffic mail servers. and, by allowing several instances
at the same time make possible to profit from SMP.
(afaik spamd only processes a single mail at the same time)

for this goal, we'll may use asm optimization, and maybe switch from PCRE to
libtre (faster but non back-referencing regexp lib).
i'll implement very fast code (using hashed tree-based search) for spam
phrases matching. i found it very usefull, many of the spam mails are
catched by this.

> > Are you interested in such thing in CVS ?
> 
> I'd love to see it.
> 
ok. as soon as i got it in production level (currently it prints lots of
debug stuff, and just analyzes mail (count score), doesn't edit headers
and give back filtered mail, anyway it isn't really needed for us).

yet another question: i've seen in docs some statistics running spamassassin
on ~40.000 spam mails and similar amount of non-spam.
can i access this spam collection/database? would be usefull for real-life
benchmarking. (currently i'm running it on ~1800 spam and ~60000 non-spam
mails for tests)


A'rpi / Astral & ESP-team

--
Developer of MPlayer, the Movie Player for Linux - http://www.MPlayerHQ.hu

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to