On Mon, 7 Nov 2016, Sean Greenslade wrote:
On November 7, 2016 9:26:29 AM PST, Eric Abrahamsen <e...@ericabrahamsen.net>
wrote:
What a lot of people (including myself) do is have two IMAP folders
learn/spam and learn/ham. When a message is incorrectly classified you
put it in the right folder, then run sa-learn on a cron job, looking in
the appropriate folder, then afterwards move the message to Junk or
INBOX, depending.
I actually took this approach a little further. I have a script that
monitors the learn-spam and learn-ham maildirs with inotify. As soon as
a message is moved to those dirs, it gets learned and fed back to my
sorting script. That way I don't have to do anything other than move to
the learn dir.
Some general recommendations:
Be careful with how much you automate the learning process. If your users
are careless and your processes blindly trust their judgement, they can
quickly poison your own Bayes database. As one example of this: many users
will start classifying as spam messages from a source that they did
validly subscribe to, because they no longer wish to be bothered by those
messages, rather than going through the proper unsubscribe process.
You may want to divide your users into two broad groups: those whose
judgement and responsibility you trust and who are allowed to train
without review, and the rest, where you review the messages for valid
classification before training.
So that would be *four* folders: two public folders exposed to your users
that they can drop messages in, and two private folders that sa-learn
trains from that you and/or someone whose judgement you trust populates
from the public training folders.
Always keep your training corpora so that you can review and if necessary
correct it, and if necessary wipe and retrain the Bayes database from
scratch. Don't discard messages after you're trained from them.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
There is no better measure of the unthinking contempt of the
environmentalist movement for civilization than their call to
turn off the lights and sit in the dark. -- Sultan Knish
-----------------------------------------------------------------------
4 days until Veterans Day