BUG? sa-learn --ham vs spamassassin -r different results

Michael Scheidell Wed, 06 Sep 2006 02:59:59 -0700

# sa-learn -L --spam and spamassassin -L -r learn the same spam
differently.
SA version 3.13, using db or sql database, doesn't seem to matter,
--sync or not --sync, doesn't matter.


Also, it doesn't matter if I run sa-learn --spam or spamassassin -r
first.

Further, spamassassin -r and sa-learn --spam learn differently, give
different results:


running spamassassin -Lr against a clean db with my test email gives me
130 tokens.
running sa-learn -L --spam against a clean db, same test email gives me
146 tokens.

Test:

# clean out old sa db:
rm -rf /var/db/spamassassin

#create new one:
mkdir /var/db/spamassassin
chown vscan:vscan /var/db/spamassassin

#test it:
su - vscan -c "sa-learn --sync && sa-learn --dump magic"

0.000          0          3          0  non-token data: bayes db version
0.000          0          0          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0          0          0  non-token data: ntokens
0.000          0          0          0  non-token data: oldest atime
0.000          0          0          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry
atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

# run new email though it (lets not mess with dcc, razor, spamcop for
this test)
su - vscan -c "spamassassin -rL < /tmp/spam.eml"
su - vscan -c "sa-learn --sync && sa-learn --dump magic"
1 message(s) examined.

0.000          0          3          0  non-token data: bayes db version
0.000          0          1          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0        130          0  non-token data: ntokens
0.000          0 1157160113          0  non-token data: oldest atime
0.000          0 1157160113          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry
atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

with sync: (no difference)
su vscan -c "sa-learn --sync && sa-learn --dump magic"
0.000          0          3          0  non-token data: bayes db version
0.000          0          1          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0        130          0  non-token data: ntokens
0.000          0 1157160113          0  non-token data: oldest atime
0.000          0 1157160113          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry
atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

Now try sa-learn:
su - vscan -c "sa-learn -L --spam < /tmp/spam.eml"
su - vscan -c "sa-learn --sync &&  && sa-learn --dump magic"
Learned tokens from 1 message(s) (1 message(s) examined)

Yep, it does something different enough.

0.000          0          3          0  non-token data: bayes db version
0.000          0          2          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0        227          0  non-token data: ntokens
0.000          0 1157160113          0  non-token data: oldest atime
0.000          0 1157464500          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry
atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

#control: let's run sa-learn first:
rm -rf /var/db/spamassassin
sme-500# mkdir -p spamassassin
sme-500# chown vscan:vscan spamassassin
sme-500# su - vscan -c "sa-learn --sync && sa-learn --dump magic"
0.000          0          3          0  non-token data: bayes db version
0.000          0          0          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0          0          0  non-token data: ntokens
0.000          0          0          0  non-token data: oldest atime
0.000          0          0          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry
atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

su - vscan -c "sa-learn -L --spam < /tmp/spam.eml"
su - vscan -c "sa-learn --sync && sa-learn --dump magic"
0.000          0          3          0  non-token data: bayes db version
0.000          0          1          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0        146          0  non-token data: ntokens
0.000          0 1157464809          0  non-token data: oldest atime
0.000          0 1157464809          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry
atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

(remember, if we ran spamassassin -r first, we only got 130 tokens)

su - vscan -c "spamassassin -Lr < /tmp/spam.eml"
1 message(s) examined.
sme-500# su - vscan -c "sa-learn --sync && sa-learn --dump magic"
0.000          0          3          0  non-token data: bayes db version
0.000          0          2          0  non-token data: nspam
0.000          0          0          0  non-token data: nham
0.000          0        227          0  non-token data: ntokens
0.000          0 1157160113          0  non-token data: oldest atime
0.000          0 1157464809          0  non-token data: newest atime
0.000          0          0          0  non-token data: last journal
sync atime
0.000          0          0          0  non-token data: last expiry
atime
0.000          0          0          0  non-token data: last expire
atime delta
0.000          0          0          0  non-token data: last expire
reduction count

same 277, so it doesn't matter if we spamassassin -Lr or sa-learn -L
--spam, but we need to do both?


-- 
Michael Scheidell, CTO
SECNAP Network Security / www.secnap.com
[EMAIL PROTECTED]  / 1+561-999-5000, x 1131


------------------------------------------------------------------------
-
Using Tomcat but need to do more? Need to support web services,
security?
Get stuff done quickly with pre-integrated technology to make your job
easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/amavis-user
AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/howto/

BUG? sa-learn --ham vs spamassassin -r different results

Reply via email to