Sebastian Arcus <s.ar...@open-t.co.uk> wrote:
> On 23/12/16 10:12, Sebastian Arcus wrote:
>> I know this hot potato has been discussed before - but I'm afraid it's
>> back to haunt me and I can't fathom it out. I'm getting again different
>> bayes results if I test a message on the command line, compared to it
>> going through exim -> spamassassin.
>>
>> </snip>
>
> OK - after staring for a good while at debug logs, I think I finally
> found the culprit. The saved .eml file which I pass through spamc
> contains the report embedded by spamassassin in the headers (that's
> how my Exim is configured). This report includes the first few lines
> of the actual email body. This in turn has the effect of effectively
> doubling the Bayes score, as spamassassin tokenizes these sample lines
> on top of the actual email body. As the email body for these
> particular spam emails is small - the sample in the header is almost
> equal in size with the text in the email body itself.
>
> As soon as I manually delete the SA headers and report in the .eml
> file, and pass the message again through spamc, I get identical Bayes
> scores to the ones when the message passes initially through Exim ->
> SA.
>
> However, this raises some interesting questions. It would appear that
> SA is incapable of recognising it's own reports in the header of the
> emails, and tokenizes them as well and adds them to the Bayes
> report. Is that right?
>
> Also, does it mean that, as SA tokenizes all the info in the headers,
> my own email address, as the recipient of the email, will also be
> added to the database of spam tokens - when I ask SA to learn a
> message as spam?
>
> I seem to have ended up with more questions than I started :-)

Have you considered using bayes_ignore_header in spamassassin
configuration file?

https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html

-- 
A. Filip

Reply via email to