Sebastian Arcus <s.ar...@open-t.co.uk> wrote: > On 23/12/16 10:12, Sebastian Arcus wrote: >> I know this hot potato has been discussed before - but I'm afraid it's >> back to haunt me and I can't fathom it out. I'm getting again different >> bayes results if I test a message on the command line, compared to it >> going through exim -> spamassassin. >> >> </snip> > > OK - after staring for a good while at debug logs, I think I finally > found the culprit. The saved .eml file which I pass through spamc > contains the report embedded by spamassassin in the headers (that's > how my Exim is configured). This report includes the first few lines > of the actual email body. This in turn has the effect of effectively > doubling the Bayes score, as spamassassin tokenizes these sample lines > on top of the actual email body. As the email body for these > particular spam emails is small - the sample in the header is almost > equal in size with the text in the email body itself. > > As soon as I manually delete the SA headers and report in the .eml > file, and pass the message again through spamc, I get identical Bayes > scores to the ones when the message passes initially through Exim -> > SA. > > However, this raises some interesting questions. It would appear that > SA is incapable of recognising it's own reports in the header of the > emails, and tokenizes them as well and adds them to the Bayes > report. Is that right? > > Also, does it mean that, as SA tokenizes all the info in the headers, > my own email address, as the recipient of the email, will also be > added to the database of spam tokens - when I ask SA to learn a > message as spam? > > I seem to have ended up with more questions than I started :-)
Have you considered using bayes_ignore_header in spamassassin configuration file? https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Conf.html -- A. Filip