Re: recent increase in spam getting through

frederik Sat, 17 Dec 2016 17:45:44 -0800

Hi Martin,

Thanks for the reply.


> Please keep your messages on the SA Users list. 

Here's my Cc line on the message you replied to:

Cc: RW <rwmailli...@googlemail.com>, "users@spamassassin.apache.org" 
<users@spamassassin.apache.org>

I don't know why it wouldn't go through to the list, perhaps I
shouldn't include spammy terms in the message body (I notice other
posters use zip attachments).

I don't have a "production" and "test" setup, just my laptop. I'm
sorry I missed your earlier suggestion to diff the outputs of
different "spamd" runs. I attach the output of the following commands:

    $ sudo spamd -u spamd -g spamd -x -D > spamd-u-g-x.out 2>&1
    $ sudo spamd -D > spamd.out 2>&1                                            
                       
    $ diff -u <(cat spamd-u-g-x.out | cut -f 5- -d ' ') <(cat spamd.out | cut 
-f 5- -d ' ') > spamd.diff

It looks like the second command is able to use my ancient Bayes token
database from my home directory, which I'd forgotten about, and gets
BAYES_999; while the first command uses the global database I just
trained from scratch yesterday (which I now see is in
/var/lib/spamassassin/.spamassassin/bayes_toks) with 3e4 ham and 3e4
spam, and only gets BAYES_60. It would be nice to be able to explain
that.

I could have sworn that there were differences in the other rules as
well, for side-by-side runs like this, but now I can't reproduce that.

Thanks for your help,

Frederick

On Sun, Dec 18, 2016 at 01:00:32AM +0000, Martin Gregorie wrote:
> On Sat, 2016-12-17 at 15:37 -0800, frede...@ofb.net wrote:
> > Thank you John, that does help clarify things a bit. Also thanks to
> > Martin - I was typing this message when I received yours, but maybe
> > this will answer some of your questions.
> > 
> Please keep your messages on the SA Users list. Apart from anything
> else, by sending off-list messages, you're losing the chance for other
> eyes to see something the rest have missed.
>
> On the two examples you've quoted, it looks as if the score difference
> is due to a lack of URIBL responses, but I can't tell why from the
> evidence I've looked at except to point out that the absence of URI-
> BLOCKED in the low scored example is odd unless this test was done
> after you switched to using your own recursive, non-forwarding DNS
> server. Have you done that?
> 
> I still don't know whether you're using the same configuration for
> production and testing, but the presence of Bayes results in only one
> set of results rather suggests that either they are not the same or
> that they are the same but you've configured per-user Bayes and one of
> the user-specific Bayes databases is untrained and/or hasn't yet seen
> 200 spams and 200 hams.
> 
> BTW, the reason I suggested you do the parallel tests and diff their
> output was because that will highlight differences, which will make
> configuration differences much more obvious. You need to do this on a
> bigger set of messages and think about what any differences it reports
> is telling you about why your testing SA setup isn't getting the same
> results as your production SA.
> 
> If you're absolutely certain that your production SA and SA test setups
> both have:
> - the configuration location defaulted
> - both are running on the same version of the OS
> - the glue[*] you're using to patch SA into your main chain is
>   duplicated in your test setup 
> 
> Then I suggest you check that the SA configurations are identical:
> - is the list of files the same on both configs?
> - are all the files in the config identical? Use 'diff' to make sure.
> - are both Sa systems running the same SA version?
> 
> [*] 'glue' means the scripts or tools such as amavis-new, MIMEdefang,
>     etc
> 
> 
> Martin
>

Re: recent increase in spam getting through

Reply via email to