On 5/16/2014 2:24 PM, Ian Zimmerman wrote:
On Fri, 16 May 2014 07:22:56 -0400
"David F. Skoll" <d...@roaringpenguin.com> wrote:

James> Is there any way to limit Bayes content checking to only the
James> first X characters of the message body?  I ask this because it is
James> clear that the spam messages getting through contain text meant
James> to poison the tests but this gibberish always trails the main
James> message and is separated by a large white space in most cases.

David> In my experience, trying to be too clever with Bayes is
David> counter-productive.  Those Bayes-poisoning attacks rarely work on
David> a well-trained corpus.  You probably just need more training for
David> Bayes to figure out what's happening.

In the last few (~10) days, I have seen a marked increase in FNs,
usually with Bayes values in the 50s and 60s.  By marked, I mean I do
pretty much nothing but adjust my various ad-hoc rules to keep from
being flooded ;-\

On close inspection, I see that the hash-busting garbage appended is
(faux) technical computing talk instead of the usual cookbooks or
classical literature :-p  That is, scrambled Stack Overflow discussions
and the like.  And of course that is what most of my ham is about, so
it makes very good sense that Bayes gets confused.

Keep in mind that BAYES_50 and BAYES_60 still contribute positive scores by default. Though it is technically a neutral result, it still adds a point or two to the score.

Rather than messing with Bayes, I would focus on the spams you are seeing and try to find a common thread that you can use to make a custom rule or two to catch them. If they all have similar garbage appended to them, there are probably other similarities you could find.

--
Bowie

Reply via email to