On Wed, 22 Sep 2004, +22:45:09 EEST (UTC +0300),
Dan Mahoney, System Admin <[EMAIL PROTECTED]> pressed some keys:

> On Wed, 22 Sep 2004, Daniel Quinlan wrote:

> >Juhapekka Tolvanen <[EMAIL PROTECTED]> writes:
> >
> >>1) Switch off that Bayesian filter of SpamAssassin, because it is
> >>implemented in slow interpreted language called Perl.
> >>
> >>2) Use DSPAM as Bayesian-like filter, because it is implemented in
> >>lightning-fast compiled language called C.

> Okay, and not to get off the topic on your opinion on perl versus c,
> but the first thing perl does when it executes a script is compiles
> it. This is why spamd is a decent solution despite being written in
> perl, because it only starts up once.

> I'm not saying that a constantly-running perl program is as fast as a
> compiled C app all of the time, but if you're going to sit here and
> suggest changes to the SpamAssassin development team without possibly
> having evaluated 3.0.0-Release for 24 hours, you might want to drop
> the condescending attitude, since I'm *sure* we all know what perl and
> C are.

If you know so well what C and Perl are, then what think about this:

http://www.nuclearelephant.com/projects/dspam/faq.html#1.7

And I'd especially like to know your opinion about this:

        "Myth 4: PERL is designed for language processing, so
        SpamAssassin is written in a more appropriate language.

        Let me preface this with the fact that I've had about 10
        years of experience coding PERL. While PERL is very useful
        for language processing and web applications, it is also an
        extremely slow, interpreted language. The average overhead
        for a single PERL process is around 2MB of RAM. Even compiled
        PERL still requires the use of a bootstrapped interpreter and
        bytecode translation. PERL is very slow compared to a compiled
        language, and the regular expression functions PERL supports
        for text extraction have their roots in the C implementation
        of regular expressions, which are much faster. DSPAM has very
        low-level string functions coded in C which are extremely fast,
        effective, and don't even require the use of processor-intensive
        regular expressions. While PERL is useful for data extraction
        and reporting, it is the completely wrong choice for language
        processing, especially in a large-scale environment. If you were
        analyzing one mailbox, PERL would be acceptable...but if you
        plan on running this on a production system with live users, it
        is a death wish."

I really don't care about attitudes of author of DSPAM. I just want to
know, how much faster SpamAssassin will be, if its Bayesian engine is
replaced with something else, for example with DSPAM. It does not hurt,
if we try it out and see what happens. And it does not hurt, if people
have more alternatives.

I can not code anything like that myself. I am just (l)user. If some
software is slow in my machine, I really feel it and see it. Even simple
system monitor software (for example ProcMeter 3) shows it, when something
is taking too much memory and CPU-time. And I can hear it myself, when
hard disks make awful noise of swapping.

When I used SpamAssassin and its Bayesian filter in my home computer,
it really was slow. Sometimes I saw dozens of E-Mails in output of
mailq-command. I even switched from plain SpamAssassin to spamd. I had
to use renice. And then even rerude or chrt. Then I switched to crm114
and it seems to work much faster: If I run mailq-command right after boot,
I may be able to see few E-Mails in queue, but most of the time it says
my mail queue is empty.

If you want to know, what kind of computer I used, here are its specs:

http://iki.fi/juhtolv/eng/tietokone.eng.html

I got better computer few weeks after switching to crm114.

After all these horrible experiences it is painful to read, when
somebody tries to explain, how fast Perl is after all. Hell yes I know
Perl-program is compiled when it starts, but it is not enough. Real
compiled language like C is faster in many cases.

But crm114 is interpeted language, too. It is specially designed for
creating Bayesian-like algorithms, though. Maybe that is one reason, it
runs so fast in my machine. Another reason may be, because it does only
Bayesian-like filtering and nothing else (like asking from RBLs or running
regexps).

BTW Creating SA-plugin that runs crm114 may be good thing to try
out, too. And I don't mind, if some people create bogofilter- and
SpamProbe-plugins for SA. Just do it, if you feel so. But DSPAM seems
more interesting for me. I haven't been able to try it out, because it
is not yet available as Debian-package and I haven't yet bothered to
compile it myself. SpamAssassin is packaged in Debian already, but
version 3.0 is not yet available as Debian-package.

I reiterate: It does not hurt, if we try out and see what happens.


-- 
Juhapekka "naula" Tolvanen * http colon slash slash iki dot fi slash juhtolv
"halpojen hoitojen maailma uljas haluaa taistosi latistaa, mielesi
lipeävedellä valkaistuun ruotuunsa, joka on hautausmaa"                  CMX

Reply via email to