On Fri, 16 May 2014 11:24:29 -0700 Ian Zimmerman <i...@buug.org> wrote:
> On close inspection, I see that the hash-busting garbage appended is > (faux) technical computing talk instead of the usual cookbooks or > classical literature :-p That is, scrambled Stack Overflow > discussions and the like. And of course that is what most of my ham > is about, so it makes very good sense that Bayes gets confused. Well, that can happen sometimes... but not that often in my experience. > 5593 0 non-token data: nspam > 6190 0 non-token data: nham Ah, I have a larger corpus: 4,608,013 spams and 4,146,168 hams. I suspect that's why Bayes poisoning is not an issue for us. Also, spammers will adjust their attack and put the poisonous stuff first, obfuscated by a style="display: none" HTML attribute or similar. Best to let Bayes work it out by itself, I think. Regards, David.