On Wed, 22 Sep 2004, +22:45:09 EEST (UTC +0300), Dan Mahoney, System Admin <[EMAIL PROTECTED]> pressed some keys:
> On Wed, 22 Sep 2004, Daniel Quinlan wrote: > >Juhapekka Tolvanen <[EMAIL PROTECTED]> writes: > > > >>1) Switch off that Bayesian filter of SpamAssassin, because it is > >>implemented in slow interpreted language called Perl. > >> > >>2) Use DSPAM as Bayesian-like filter, because it is implemented in > >>lightning-fast compiled language called C. > Okay, and not to get off the topic on your opinion on perl versus c, > but the first thing perl does when it executes a script is compiles > it. This is why spamd is a decent solution despite being written in > perl, because it only starts up once. > I'm not saying that a constantly-running perl program is as fast as a > compiled C app all of the time, but if you're going to sit here and > suggest changes to the SpamAssassin development team without possibly > having evaluated 3.0.0-Release for 24 hours, you might want to drop > the condescending attitude, since I'm *sure* we all know what perl and > C are. If you know so well what C and Perl are, then what think about this: http://www.nuclearelephant.com/projects/dspam/faq.html#1.7 And I'd especially like to know your opinion about this: "Myth 4: PERL is designed for language processing, so SpamAssassin is written in a more appropriate language. Let me preface this with the fact that I've had about 10 years of experience coding PERL. While PERL is very useful for language processing and web applications, it is also an extremely slow, interpreted language. The average overhead for a single PERL process is around 2MB of RAM. Even compiled PERL still requires the use of a bootstrapped interpreter and bytecode translation. PERL is very slow compared to a compiled language, and the regular expression functions PERL supports for text extraction have their roots in the C implementation of regular expressions, which are much faster. DSPAM has very low-level string functions coded in C which are extremely fast, effective, and don't even require the use of processor-intensive regular expressions. While PERL is useful for data extraction and reporting, it is the completely wrong choice for language processing, especially in a large-scale environment. If you were analyzing one mailbox, PERL would be acceptable...but if you plan on running this on a production system with live users, it is a death wish." I really don't care about attitudes of author of DSPAM. I just want to know, how much faster SpamAssassin will be, if its Bayesian engine is replaced with something else, for example with DSPAM. It does not hurt, if we try it out and see what happens. And it does not hurt, if people have more alternatives. I can not code anything like that myself. I am just (l)user. If some software is slow in my machine, I really feel it and see it. Even simple system monitor software (for example ProcMeter 3) shows it, when something is taking too much memory and CPU-time. And I can hear it myself, when hard disks make awful noise of swapping. When I used SpamAssassin and its Bayesian filter in my home computer, it really was slow. Sometimes I saw dozens of E-Mails in output of mailq-command. I even switched from plain SpamAssassin to spamd. I had to use renice. And then even rerude or chrt. Then I switched to crm114 and it seems to work much faster: If I run mailq-command right after boot, I may be able to see few E-Mails in queue, but most of the time it says my mail queue is empty. If you want to know, what kind of computer I used, here are its specs: http://iki.fi/juhtolv/eng/tietokone.eng.html I got better computer few weeks after switching to crm114. After all these horrible experiences it is painful to read, when somebody tries to explain, how fast Perl is after all. Hell yes I know Perl-program is compiled when it starts, but it is not enough. Real compiled language like C is faster in many cases. But crm114 is interpeted language, too. It is specially designed for creating Bayesian-like algorithms, though. Maybe that is one reason, it runs so fast in my machine. Another reason may be, because it does only Bayesian-like filtering and nothing else (like asking from RBLs or running regexps). BTW Creating SA-plugin that runs crm114 may be good thing to try out, too. And I don't mind, if some people create bogofilter- and SpamProbe-plugins for SA. Just do it, if you feel so. But DSPAM seems more interesting for me. I haven't been able to try it out, because it is not yet available as Debian-package and I haven't yet bothered to compile it myself. SpamAssassin is packaged in Debian already, but version 3.0 is not yet available as Debian-package. I reiterate: It does not hurt, if we try out and see what happens. -- Juhapekka "naula" Tolvanen * http colon slash slash iki dot fi slash juhtolv "halpojen hoitojen maailma uljas haluaa taistosi latistaa, mielesi lipeävedellä valkaistuun ruotuunsa, joka on hautausmaa" CMX