On Mon, 7 Nov 2016, Sean Greenslade wrote:

On November 7, 2016 9:26:29 AM PST, Eric Abrahamsen <e...@ericabrahamsen.net> 
wrote:
What a lot of people (including myself) do is have two IMAP folders
learn/spam and learn/ham. When a message is incorrectly classified you
put it in the right folder, then run sa-learn on a cron job, looking in
the appropriate folder, then afterwards move the message to Junk or
INBOX, depending.

I actually took this approach a little further. I have a script that monitors the learn-spam and learn-ham maildirs with inotify. As soon as a message is moved to those dirs, it gets learned and fed back to my sorting script. That way I don't have to do anything other than move to the learn dir.

Some general recommendations:

Be careful with how much you automate the learning process. If your users are careless and your processes blindly trust their judgement, they can quickly poison your own Bayes database. As one example of this: many users will start classifying as spam messages from a source that they did validly subscribe to, because they no longer wish to be bothered by those messages, rather than going through the proper unsubscribe process.

You may want to divide your users into two broad groups: those whose judgement and responsibility you trust and who are allowed to train without review, and the rest, where you review the messages for valid classification before training.

So that would be *four* folders: two public folders exposed to your users that they can drop messages in, and two private folders that sa-learn trains from that you and/or someone whose judgement you trust populates from the public training folders.

Always keep your training corpora so that you can review and if necessary correct it, and if necessary wipe and retrain the Bayes database from scratch. Don't discard messages after you're trained from them.


--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  There is no better measure of the unthinking contempt of the
  environmentalist movement for civilization than their call to
  turn off the lights and sit in the dark.            -- Sultan Knish
-----------------------------------------------------------------------
 4 days until Veterans Day

Reply via email to