>> Can anyone explain why deleting the database seemed to fix it's
>> learning?  Is it just a corrupt database or something I am liable to hit
>> again further down the track?

JM> It must have learned those Message-IDs before.  It will not learn the same
JM> message ID twice.

OK, here's the deal.

There is a file called bayes_toks where all the learned
spammy stuff are stored.  There is a file called bayes_seen
which is where the Bayes sa-learn function checks first
before learning, to make sure it is learning something new.

Here's where things can go wrong:  if something happens
to corrupt the bayes_toks file it could get lost - at least
in some circumstances, sa-learn will overwrite an old (big)
bayes_toks file with just the new stuff being learned in a
given session. (This happened to me once when there was an
out-of-memory problem, and once when for some reason
sa-learn could not access one of the files -- I run a
nightly cron job to feed new spam to sa-learn, and it is at
this stage that the problems occured.)

I was able to fix my problems by recovering the old
bayes_toks file from my mirrored backup drive, and REMOVING
the other files from the directory (bayes_seen, etc.) - and
running sa-learn --build

Removing bayes_seen was absolutely essential to this
process; otherwise Bayes would refuse to learn the new data.

I think there is a problem because now I have an incomplete
bayes_seen file, and so my bayes_toks file will end up
relearning spam it already has. On the other hand, Bayes is
working beautifully this way, perhaps because the corpus is
simply large enough to handle the possible duplication of
tokens. (Also, if the same spam is being sent repeatedly
over time, that in itself is an indication that it is
particularly likely to be spam - so it is possible that I
have inadvertantly created a system more weighted on the
basis of spam frequency and volume.)

Anyway - if you aren't experiencing problems with the Bayes
function, I wouldn't worry about the learning process -- but
if you are, then you may be able to rectify things by
removing the bayes_seen file and running a rebuild.

(I reported my issue as a bug, because I think that sa-learn
should abort when it runs into problems rather than
overwrite and destroy essential files).

-Abigail



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to