On Mon, 2011-02-21 at 10:57 -0700, Justin Gould wrote:
> I'm relatively new to Postfix, SpamAssassin and what not, so this might
> be a silly question.  I'm on SpamAssassin 3.3.1 on Debian Linux with
> Postfix.  I've inherited this configuration from a previous
> administrator and am still trying to understand it.  
> 
> To summarize up front, the two issues I need to resolve are: 1.  Both
> Postfix and Procmail are running SA against each message.  I know this
> needs to be one or the other.  2.  Postfix and Procmail are coming up
> with very different results for the same message, both through
> /usr/bin/spamc.
> 
> The previous administrator had configured SA to run BOTH as
> content_filter for Postfix AND in Procmail.
[...]

> This is causing some weirdness, which we would expect.  I'm seeing
> messages that have two radically different scores.  For instance the
> header might show:
> 
> X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.cscape.net.
> X-Spam-Level: ***
> X-Spam-Status: No, score=3.9 required=6.0 tests=HTML_IMAGE_ONLY_20,
>         HTML_IMAGE_RATIO_02,HTML_MESSAGE,HTML_SHORT_LINK_IMG_3,NO_RELAYS,
>         SARE_MILLIONSOF,TO_NO_BRKTS_NOTLIST,URIBL_BLACK autolearn=no 
> version=3.3.1

This actually is the second scanning (compare to your assumption below
of the second process not modifying the headers ;).

It appears, this is what happens. The first SA processing classifies a
mail as spam, generates a reporting mail with the original spam attached
unaltered (report_safe 1). The second SA run then examines this *report*
mail generated by SA -- note the NO_RELAYS rule hit in the header above.


> But the BODY shows:
> 
> Content analysis details:   (10.6 points, 5.0 required)

This report is actually the first processing. Since it does have all the
original headers, unlike the one above, there is no NO_RELAYS hit here.
Moreover, there *are* DNSBL blacklist hits, based on the original
Received headers. Again, unlike the above, where they are not available
due to the original mail being attached to the report.

Yet another difference between the two, resulting in different scores
per rule, is the first processing (shown below) to have Bayes enabled --
site-wide Bayes DB shared amongst all users, since this is the postfix
content_filter one. The second scan (see headers above) obviously is
being run with Bayes disabled.


>  pts rule name              description
> ---- ---------------------- --------------------------------------------------
>  1.4 RCVD_IN_BRBL_LASTEXT   RBL: RCVD_IN_BRBL_LASTEXT
>                             [217.23.11.97 listed in bb.barracudacentral.org]
>  1.7 URIBL_BLACK            Contains an URL listed in the URIBL blacklist
>                             [URIs: only-3d.com]
> -0.0 SPF_HELO_PASS          SPF: HELO matches SPF record
>  0.3 SARE_MILLIONSOF        BODY: Millions of something.
>  1.5 HTML_IMAGE_ONLY_20     BODY: HTML: images with 1600-2000 bytes of words
>  0.4 HTML_IMAGE_RATIO_02    BODY: HTML has a low ratio of text to image area
>  0.0 HTML_MESSAGE           BODY: HTML included in message
>  0.8 BAYES_50               BODY: Bayes spam probability is 40 to 60%
>                             [score: 0.5000]
>  0.7 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts
>  0.1 HTML_SHORT_LINK_IMG_3  HTML is very short with a linked image
>  3.5 TO_NO_BRKTS_DIRECT     To: misformatted and direct-to-MX
>  0.0 TO_NO_BRKTS_NOTLIST    To: misformatted and not a mailing list
> 
> A 6.7 point score discrepancy between the two.  This results in mail
> tagged as SPAM in the subject line (SA local.cf) making it to the
> user's inbox, because the second process/higher scoring is not
> modifying the message headers.

The first process generated a reporting wrapper message, with the
Subject rewritten -- and proper X-Spam headers in the wrapper message,
reflecting the report blob in the body.

The second process then overwrites the X-Spam headers.


> Obviously the higher score is correct.  Any suggestions on how I might
> track down why the scores are different, whether to use the Procmail
> or Postfix master.cf methods exclusively, etc?  

How to track it down? Tend to the SA users list and show exactly the
data you did. ;)  The tracking down part I just solved and explained
above.

Which one to use depends on your preferences, needs and users. Like, do
you want a site-wide Bayes DB (postfix), or should your users have
individual Bayes DBs (procmail)? Either one would do [1]. Given that
Bayes appears to be disabled during the per-user procmail scan...

A quick fix is, to just comment out the spamc call in procmail (probably
keep the delivery recipes), and keep the postfix content_filter.

That eliminates the erroneous second processing, and should make your
system behave as expected already. From there, you can take your time to
understand the processing, subtle differences, and decide what to tweak
further. If any.


[1] Even both would do, IFF report_safe would have been set to 0, and
    Subject rewriting disabled.

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to