On Fri, August 9, 2013 1:01 pm, RW wrote: > BAYES works on rendered text it doesn't see the HTML.
Hmmm. It doesn't see HTML comments, which would appear in rendered HTML source even though they are "invisible?" OK, in that case, I have NO idea why the spam isn't hitting Bayes, because it looks pretty damn spammy to me. I wonder if it's the heavy use of images, but I don't know. > Do you actually get a significant amount of ham between 0.99 and 0.999? > Personally I only get 1 in 1000 above 0.55, and nothing above 0.65. Ham, absolutely not. So yes, I suppose I could just treat all Bayes99 as if it were Bayes999 and score it more highly than I do. Right now I have Bayes99 at 4, Bayes999 at 4.5. I could eliminate Bayes999 and make Bayes99 score 4.5... but I do worry a little bit about FPs, even though I guess I shoudn't, statistically speaking. On the other hand, one could consider making Bayes999 a poison pill. Generally spam will only rank there if you've learned something nearly identical to it. At that point, perhaps it might be worth just scoring it with 5 or higher (assuming your threshold is 5, as mine is). --- Amir