Hi Martin, Thanks for the reply.
> Please keep your messages on the SA Users list. Here's my Cc line on the message you replied to: Cc: RW <rwmailli...@googlemail.com>, "users@spamassassin.apache.org" <users@spamassassin.apache.org> I don't know why it wouldn't go through to the list, perhaps I shouldn't include spammy terms in the message body (I notice other posters use zip attachments). I don't have a "production" and "test" setup, just my laptop. I'm sorry I missed your earlier suggestion to diff the outputs of different "spamd" runs. I attach the output of the following commands: $ sudo spamd -u spamd -g spamd -x -D > spamd-u-g-x.out 2>&1 $ sudo spamd -D > spamd.out 2>&1 $ diff -u <(cat spamd-u-g-x.out | cut -f 5- -d ' ') <(cat spamd.out | cut -f 5- -d ' ') > spamd.diff It looks like the second command is able to use my ancient Bayes token database from my home directory, which I'd forgotten about, and gets BAYES_999; while the first command uses the global database I just trained from scratch yesterday (which I now see is in /var/lib/spamassassin/.spamassassin/bayes_toks) with 3e4 ham and 3e4 spam, and only gets BAYES_60. It would be nice to be able to explain that. I could have sworn that there were differences in the other rules as well, for side-by-side runs like this, but now I can't reproduce that. Thanks for your help, Frederick On Sun, Dec 18, 2016 at 01:00:32AM +0000, Martin Gregorie wrote: > On Sat, 2016-12-17 at 15:37 -0800, frede...@ofb.net wrote: > > Thank you John, that does help clarify things a bit. Also thanks to > > Martin - I was typing this message when I received yours, but maybe > > this will answer some of your questions. > > > Please keep your messages on the SA Users list. Apart from anything > else, by sending off-list messages, you're losing the chance for other > eyes to see something the rest have missed. > > On the two examples you've quoted, it looks as if the score difference > is due to a lack of URIBL responses, but I can't tell why from the > evidence I've looked at except to point out that the absence of URI- > BLOCKED in the low scored example is odd unless this test was done > after you switched to using your own recursive, non-forwarding DNS > server. Have you done that? > > I still don't know whether you're using the same configuration for > production and testing, but the presence of Bayes results in only one > set of results rather suggests that either they are not the same or > that they are the same but you've configured per-user Bayes and one of > the user-specific Bayes databases is untrained and/or hasn't yet seen > 200 spams and 200 hams. > > BTW, the reason I suggested you do the parallel tests and diff their > output was because that will highlight differences, which will make > configuration differences much more obvious. You need to do this on a > bigger set of messages and think about what any differences it reports > is telling you about why your testing SA setup isn't getting the same > results as your production SA. > > If you're absolutely certain that your production SA and SA test setups > both have: > - the configuration location defaulted > - both are running on the same version of the OS > - the glue[*] you're using to patch SA into your main chain is > duplicated in your test setup > > Then I suggest you check that the SA configurations are identical: > - is the list of files the same on both configs? > - are all the files in the config identical? Use 'diff' to make sure. > - are both Sa systems running the same SA version? > > [*] 'glue' means the scripts or tools such as amavis-new, MIMEdefang, > etc > > > Martin >