Don Levey wrote: > Please forgive me if this is in the archives; I'm having trouble > finding it. > > I've just finished training my Bayes DB using sa-learn (perversely, > when I was trying to collect 200 spam messages, the spammers decided > to stop sending to me). Now that the DB is usable, it's interesting > that while most ham messages produce at least one small rule hit and > a negative Bayes score that results in "Autolearn=no", when BAYES_00 > is the ONLY rule that hits I get "Autolearn=failed". > > Two quick questions: > 1) What should I do about this, and > 2) Should I worry, or just ignore it? > > TIA, > -Don
I may have found at least part of the problem, at least as far as the "autolearn=no" portion of the question. Running a message through "spamassassin -D --mbox < msgfile" gives me the following last few lines: debug: running body-text per-line regexp tests; score so far=8.886 debug: running uri tests; score so far=8.886 debug: running raw-body-text per-line regexp tests; score so far=8.886 debug: running full-text regexp tests; score so far=8.886 debug: auto-learn: currently using scoreset 3, recomputing score based on scoreset 1. debug: auto-learn: message score: 8.886, computed score for autolearn: 7.223 debug: auto-learn? ham=0.1, spam=12, body-points=3.1, head-points=3.64, learned-points=-1.096 debug: auto-learn? no: inside auto-learn thresholds, not considered ham or spam debug: is spam? score=8.886 required=5 debug: tests=BAYES_40,DATE_IN_FUTURE_03_06,FORGED_YAHOO_RCVD,MIME_HEADER_CTYPE_ONLY ,NO_OBLIGATION,SUBJ_LIFE_INSURANCE,URIBL_OB_SURBL,URIBL_WS_SURBL debug: subtests=__BAT_BOUNDARY,__CT,__CTYPE_HAS_BOUNDARY,__HAS_MSGID,__HAS_SUBJECT, __MSGID_OK_DIGITS,__MSGID_OK_HEX,__MSGID_OK_HOST,__RCVD_IN_NJABL,__RCVD_IN_S O RBS,__RFC_IGNORANT_ENVFROM,__SANE_MSGID So somewhere I've got set that in order to autolearn as spam, I must have a score of 12, and to learn as ham the score must be less than 0.1. This particular message scored 11.9. The next step was to try a message that had a score greater than 12. I saw that on the example I chose, I also got "autolearn=failed" in the header. Running the same debug command line, I got: debug: running body-text per-line regexp tests; score so far=15.837 debug: running uri tests; score so far=15.837 debug: running raw-body-text per-line regexp tests; score so far=15.837 debug: running full-text regexp tests; score so far=15.837 debug: auto-learn: currently using scoreset 3, recomputing score based on scoreset 1. debug: auto-learn: message score: 15.837, computed score for autolearn: 13.387 debug: auto-learn? ham=0.1, spam=12, body-points=11.404, head-points=5.843, learned-points=0.001 debug: auto-learn? yes, spam (13.387 > 12) debug: Learning Spam <debug tokenizing messages removed for brevity> debug: bayes: 20664 untie-ing debug: bayes: 20664 untie-ing db_toks debug: bayes: 20664 untie-ing db_seen debug: bayes: files locked, now unlocking lock debug: unlock: 20664 unlink /etc/mail/spamassassin/bayes_db.lock debug: is spam? score=15.837 required=5 debug: tests=BAYES_50,FORGED_YAHOO_RCVD,MIME_HEADER_CTYPE_ONLY,RCVD_IN_BL_SPAMCOP_N ET,RCVD_IN_XBL,URIBL_OB_SURBL,URIBL_SBL,URIBL_SC_SURBL,URIBL_WS_SURBL debug: subtests=__BAT_BOUNDARY,__CT,__CTYPE_HAS_BOUNDARY,__HAS_MSGID,__HAS_SUBJECT, __MSGID_OK_HOST,__RCVD_IN_SBL_XBL,__RFC_IGNORANT_ENVFROM,__SANE_MSGID As should be clear here, it says that the message WAS autolearned. And I see that in the message headers generated from this run, I did get "autolearn=spam". I am doing this as the same user as is running spamd (platform is Fedora, where the spamassassin "service" run is spamd). I had been hoping to get debug messages from the above, but everything was fine. Checking in my maillog, however, hit a bit of paydirt: Apr 1 09:40:01 davinci spamd[9864]: connection from davinci.example.com [127.0.0.1] at port 41609 Apr 1 09:40:01 davinci spamd[9864]: info: setuid to root succeeded Apr 1 09:40:01 davinci spamd[9864]: Still running as root: user not specified with -u, not found, or set to root. Fall back to nobody. Apr 1 09:40:01 davinci spamd[9864]: processing message <[EMAIL PROTECTED]> for root:99. Apr 1 09:40:01 davinci spamd[9864]: bayes expire_old_tokens: lock: 9864 cannot create tmp lockfile /etc/mail/spamassassin/bayes_db.lock.davinci.example.com.9864 for /etc/mail/spamassassin/bayes_db.lock: Permission denied Apr 1 09:40:01 davinci spamd[9864]: cannot write to /etc/mail/spamassassin/bayes_db_journal, Bayes db update ignored: Permission denied Apr 1 09:40:07 davinci spamd[9864]: clean message (-4.9/5.0) for root:99 in 6.1 seconds, 3079 bytes. Apr 1 09:40:07 davinci spamd[9864]: result: . -4 - BAYES_00 scantime=6.1,size=3079,mid=<[EMAIL PROTECTED]>, bayes=0,autolearn=failed Note that I am getting a permissions error creating the lock file. This seems to be because the permissions on the /etc/mail/spamassassin directory do not permit the user 'spamd' to write the lock file. I've at least temporarily fixed this while I sort out the user ID situation, but now I'm autolearning. Why am I telling you all of this? Because someone you know may be in a similar situation, or *you* may be in a similar situation. This at least gets the info in the archives (perhaps again) so that it may be found. Thanks for your time, -Don