On Fri, 16 Oct 2015 20:59:52 -0500 Ryan Coleman wrote: > How do I go about checking that my automated scripts that handle spam > learning are actually learning? I have literally hundreds of emails a > day that go into the ?new? folder I have set up and it does not seem > to be learning from them. > ... > > sa-learn commands: > [scans domains for specified folders and scans them] > > /usr/bin/find /var/mail/vhosts/ -name '*.Spam.New*' -type d > > -exec /usr/bin/sa-learn --no-sync --spam --progress {}* > > \; /usr/bin/find /var/mail/vhosts/ -name '*.Spam.Suspected*' -type > > d -exec /usr/bin/sa-learn --no-sync --spam --progress {}* \;
There are a few thing wrong with this. The * in {}* is at very best superfluous and may be causing various possible problems. It wouldn't work at all with a POSIX compliant shell. Also, for a maildir folder foo you are running sa-learn separately on foo/, foo/cur, foo/new and foo/tmp. sa-learn understands maildir so training on new & cur involves unnecessary parsing and invocations of sa-learn. You shouldn't be training on tmp at all because you might get an incomplete email. Also I don't see anything about learning ham. One you've fixed your script append the following: sa-learn -D bayes --dump magic >> /var/tmp/sa-debug 2>&1 and then let the script run as it would do normally do, from cron or whatever. When you look at the output file, check nspam is increasing as new spam is trained and that nspam and nham are both over 200. Then check that delivery and training are using the same database. Look at the location of the bayes files in the debug. Take a look at the mtime of the bayes journal file in the same directory, and check that it's updated during a mail delivery scan.