On Mon, 16 Feb 2015, ttgh wrote:

John, by 'spam corpus' are you referring to the 'spam' side of the Bayesian
filter?

Correct.

If we manually delay/review these known-bad accounts are we creating a window of opportunity for those same messages to pass through to current users?

To a degree, yes. You have to weigh that against the impact of false positives.

I've been assuming we would need to create an intentional delay (e.g. 60
second) on all Amavis processing,

I don't see the need for that.

combined with an exception for the known-bad addresses wherein they would immediately get added to the Bayesian filter. Does the Bayesian update in real-time as you feed it new spam or do you need to request a periodic rebuild?

It updates in real time if you have the autolearn option enabled. I don't use it (my user base is tiny) so others would have to provide practical advice on that.

Also I still don't understand why everyone is so reticent to immediately
black-list messages based on these 100% known-bad addressess.

Because very little in the real world is 100% guaranteed, and it's generally felt better to get a few more spams than it is to misclassify and possibly lose hams.

For instance, is it possible for a bulk spam message to trigger false positives?

It depends. If you're in the pharmaceuticals business, potentially yes. :)

There is zero concern about valid company clients/contacts mistakenly
emailing these ex-employees, e.g. these were entry-level staff who did not
interact with clients and just used their office email for personal use (and
got themselves onto lots of spam lists in the process).

Okay, that does sound like a reasonably safe source, then.

I mean, we literally are 100% positive their incoming email is spam...

Well, if you're confident, then proceed. Give yourself a way to recover, though.

Forward mail for the "honeypot" mailboxes to your spam corpus mailbox and train from that, running sa-learn on a schedule depending on volume (hourly, every few hours, on down to once daily). Rotate the spam corpus mailbox daily or weekly to reduce the work sa-learn has to do wading through old messages, but do retain the historical mailboxes so you can review and reclassify messages (i.e. retrain as ham) if necessary.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  ...for a nation to tax itself into prosperity is like a man
  standing in a bucket and trying to lift himself up by the handle.
                                                 -- Winston Churchill
-----------------------------------------------------------------------
 6 days until George Washington's 283rd Birthday

Reply via email to