On 10/14/2014 04:17 PM, David F. Skoll wrote:
On Tue, 14 Oct 2014 16:10:52 +0200
Axb <axb.li...@gmail.com> wrote:

and to avoid further discussions of what header may pollute bayes or
not, I've removed all header entries which are not directly related
to AV/filter products.

I'm not sure I agree with being too clever about Bayes.  Surely by its
very nature, the Bayes algorithm will itself indicate which tokens
are relevant and which are not?  Isn't that the whole point of Bayes?

I think being to clever about massaging the data that gets fed to
Bayes may be counter-productive.  For sure, *some* massaging is in order;
a token should be a semantic unit, so something like "www.example.com"
should probably be one token rather than three, but beyond that I wonder
if it's good or not to massage the data?

David,

The "boys_ignore" file will not become a part of SA default .cf files.
My intention is to keep a central repository in case somebody else wants to use it instead of mantaining in my local repo.

I believe in *some* massaging, as in "works for me".

I assume it depends on how you feed bayes and what kind of traffic you deal with.

The concept of avoiding bayes from learning other filter's stuff is ancient (there's a commented example in local.cf) but as with so much in SA tuning , it's trial and possible error till you feel cozy.



Reply via email to