On Tue, 14 Oct 2014 16:10:52 +0200 Axb <axb.li...@gmail.com> wrote: > and to avoid further discussions of what header may pollute bayes or > not, I've removed all header entries which are not directly related > to AV/filter products.
I'm not sure I agree with being too clever about Bayes. Surely by its very nature, the Bayes algorithm will itself indicate which tokens are relevant and which are not? Isn't that the whole point of Bayes? I think being to clever about massaging the data that gets fed to Bayes may be counter-productive. For sure, *some* massaging is in order; a token should be a semantic unit, so something like "www.example.com" should probably be one token rather than three, but beyond that I wonder if it's good or not to massage the data? Regards, David.