sa-learn vs spamassassin tests

Michael Scheidell Tue, 05 Sep 2006 07:09:15 -0700

# sa-learn -L --spam and spamassassin -L -r learn the same spam differently.
SA version 3.13, using db or sql database, doesn't seem to matter,
--sync or not --sync, doesn't matter.


Also, it doesn't matter if I run sa-learn --spam or spamassassin -r first.

Further, spamassassin -r and sa-learn --spam learn differently, give
different results:


running spamassassin -Lr against a clean db with my test email gives me
130 tokens.
running sa-learn -L --spam against a clean db, same test email gives me
146 tokens.

Test:

# clean out old sa db:
rm -rf /var/db/spamassassin

#create new one:
mkdir /var/db/spamassassin
chown vscan:vscan /var/db/spamassassin

#test it:
su - vscan -c "sa-learn --sync && sa-learn --dump magic"

0.000          0          3          0  non-token data: bayes db version
0.000          0          0          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0          0          0  non-token data: ntokens
0.000          0          0          0  non-token data: oldest atime
0.000          0          0          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

# run new email though it (lets not mess with dcc, razor, spamcop for
this test)
su - vscan -c "spamassassin -rL < /tmp/spam.eml"
su - vscan -c "sa-learn --sync && sa-learn --dump magic"
1 message(s) examined.

0.000          0          3          0  non-token data: bayes db version
0.000          0          1          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0        130          0  non-token data: ntokens
0.000          0 1157160113          0  non-token data: oldest atime
0.000          0 1157160113          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

with sync: (no difference)
su vscan -c "sa-learn --sync && sa-learn --dump magic"
0.000          0          3          0  non-token data: bayes db version
0.000          0          1          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0        130          0  non-token data: ntokens
0.000          0 1157160113          0  non-token data: oldest atime
0.000          0 1157160113          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

Now try sa-learn:
su - vscan -c "sa-learn -L --spam < /tmp/spam.eml"
su - vscan -c "sa-learn --sync &&  && sa-learn --dump magic"
Learned tokens from 1 message(s) (1 message(s) examined)

Yep, it does something different enough.

0.000          0          3          0  non-token data: bayes db version
0.000          0          2          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0        227          0  non-token data: ntokens
0.000          0 1157160113          0  non-token data: oldest atime
0.000          0 1157464500          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

#control: let's run sa-learn first:
rm -rf /var/db/spamassassin
sme-500# mkdir -p spamassassin
sme-500# chown vscan:vscan spamassassin
sme-500# su - vscan -c "sa-learn --sync && sa-learn --dump magic"
0.000          0          3          0  non-token data: bayes db version
0.000          0          0          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0          0          0  non-token data: ntokens
0.000          0          0          0  non-token data: oldest atime
0.000          0          0          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

su - vscan -c "sa-learn -L --spam < /tmp/spam.eml"
su - vscan -c "sa-learn --sync && sa-learn --dump magic"
0.000          0          3          0  non-token data: bayes db version
0.000          0          1          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0        146          0  non-token data: ntokens
0.000          0 1157464809          0  non-token data: oldest atime
0.000          0 1157464809          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

(remember, if we ran spamassassin -r first, we only got 130 tokens)

su - vscan -c "spamassassin -Lr < /tmp/spam.eml"
1 message(s) examined.
sme-500# su - vscan -c "sa-learn --sync && sa-learn --dump magic"
0.000          0          3          0  non-token data: bayes db version
0.000          0          2          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0        227          0  non-token data: ntokens
0.000          0 1157160113          0  non-token data: oldest atime
0.000          0 1157464809          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

same 277, so it doesn't matter if we spamassassin -Lr or sa-learn -L
--spam, but we need to do both?


-- 
Michael Scheidell, CTO
SECNAP Network Security / www.secnap.com
[EMAIL PROTECTED]  / 1+561-999-5000, x 1131

sa-learn vs spamassassin tests

Reply via email to