On Mon, 28 Dec 2015 23:42:03 -0500 Bill Cole wrote:
> Using these facts, my learning script that runs as root and reads > from multiple real users' Maildirs does this to learn ham: > > for AFILE in $HAMS ; do formail < $AFILE ; done| sudo -H -u > $SAUSER sa-learn --ham --mbox > > Where $HAMS is the list of ham message files and $SAUSER is the user > handling the system-wide BayesDB. I use formail there just to give > each message a leading 'From ' line (i.e. mbox format) so that the > whole bunch can be piped into a single sa-learn invocation. IIRC when you do that sa-learn just creates a temporary file and then runs on that. > The alternative without formail would be to pipe each raw message into > its own sa-learn. The alternative is to give it a directory. It can work out for itself whether it's maildir or just a directory of files. If you need to train an arbitrary selection of files, you could symlink them into a temporary directory. If you run spamd it's also possible to train via spamc. Personally I'd avoid the unforced use of mbox around Bayes without being sure that "From-escaping" is taken account of . The problem is that formail will replace a "From" at the beginning of a body line with ">From" which changes the msgid hash and prevents the correct retraining of mail that was trained without going through formail - e.g. the correction of autotraining. I just had a quick look and I can't see any support for this in SpamAssassin. It's not a major problem, but in this case it's an easily avoidable one.