I've used spamassassin for many years - on Ubuntu, using amvisd - with
great success. In recent months, I've been receiving several spam
messages each day that evade the filters.
* These false-negatives conform to a handful of simple, formulaic,
textual forms - on common subjects.
* The emails consist fairly plain HTML and appear not to employ any
significant obfuscation.
* I have tried to train spamassassin with many of these spam samples -
without any effect.
* The bayes database is updated. The bayes_journal (37k), bayes_seen
(5.2mb) and bayes_toks (5.4mb) files all have recent timestamps.
* The false positives all match BAYES_00 - attracting a default score of
-1.9. BAYES_00 seems to be at the crux of the misclassification.
Is there a way to delve into why these messages have been allocated such
a low bayes score - while (to a human) appearing blatant, simple, spam
on "vanilla" spam topics? Has my bayes data been "poisoned" somehow?
It is worth noting that I get a lot of correctly identified spam - and
much of that matches BAYES_99 and BAYES_999... and my ham gets
BATES_00... so, for many messages, bayes is working. Is it likely that I
am suffering poor performance (for these specific messages) as a result
of some tunable parameter?
What is the most effective way to tackle this?