On Fri, 16 Oct 2015 20:59:52 -0500
Ryan Coleman wrote:

> How do I go about checking that my automated scripts that handle spam
> learning are actually learning? I have literally hundreds of emails a
> day that go into the ?new? folder I have set up and it does not seem
> to be learning from them. 
> ...
> 
> sa-learn commands:
> [scans domains for specified folders and scans them]
> > /usr/bin/find /var/mail/vhosts/ -name '*.Spam.New*' -type d
> > -exec /usr/bin/sa-learn --no-sync --spam --progress {}*
> > \; /usr/bin/find /var/mail/vhosts/ -name '*.Spam.Suspected*' -type
> > d -exec /usr/bin/sa-learn --no-sync --spam --progress {}* \;

There are a few thing wrong with this. 

The * in {}* is at very best superfluous and may be causing various
possible problems. It wouldn't work at all with a POSIX compliant shell.

Also, for a  maildir folder foo you are running sa-learn separately on
foo/, foo/cur, foo/new and foo/tmp. sa-learn understands maildir so
training on new & cur involves unnecessary parsing and invocations of
sa-learn. You shouldn't be training on tmp at all because you might get
an incomplete email.

Also I don't see anything about learning ham.


One you've fixed your script append the following:  

   sa-learn -D bayes --dump magic >> /var/tmp/sa-debug 2>&1

and then let the script run as it would do normally do, from cron or
whatever.

When you look at the output file, check nspam is increasing as new spam
is trained and that nspam and nham are both over 200. 

Then check that delivery and training are using the same database. Look
at the location of the bayes files in the debug. Take a look at the
mtime of the bayes journal file in the same directory, and check that
it's updated during a mail delivery scan.

Reply via email to