Am 14.10.2014 um 23:37 schrieb Axb:
On 10/14/2014 11:08 PM, Adam Katz wrote:
On Tue, 14 Oct 2014 16:10:52 +0200 Axb <axb.li...@gmail.com> wrote:
and to avoid further discussions of what header may pollute bayes or
not, I've removed all header entries which are not directly related
to AV/filter products.

On 10/14/2014 07:17 AM, David F. Skoll wrote:
I'm not sure I agree with being too clever about Bayes.  Surely by its
very nature, the Bayes algorithm will itself indicate which tokens
are relevant and which are not?  Isn't that the whole point of Bayes?

I think being to clever about massaging the data that gets fed to
Bayes may be counter-productive.  For sure, *some* massaging is in
order;
a token should be a semantic unit, so something like "www.example.com"
should probably be one token rather than three, but beyond that I wonder
if it's good or not to massage the data?

The purpose of bayes_ignore_header is twofold:

  1. Prevent inheriting other systems' false positives (ensure better
     independence)
  2. Prevent relying upon headers that won't exist at delivery time (e.g.
     added by the mailbox server)

This is why it's so important to ignore other spam engines, which
basically fit into both of those categories.

I'd love to have the option (switch) to use Bayes on msg bodies ONLY,
though I doubt anybody would be a taker for such a project.
(I'd even be willing to "$pon$or" such an addition to SA)

or someting like the opposit as now:

bayes_include_header received
bayes_include_header subject
bayes_include_header x-mailer

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to