On Mon, 2011-02-21 at 10:57 -0700, Justin Gould wrote: > I'm relatively new to Postfix, SpamAssassin and what not, so this might > be a silly question. I'm on SpamAssassin 3.3.1 on Debian Linux with > Postfix. I've inherited this configuration from a previous > administrator and am still trying to understand it. > > To summarize up front, the two issues I need to resolve are: 1. Both > Postfix and Procmail are running SA against each message. I know this > needs to be one or the other. 2. Postfix and Procmail are coming up > with very different results for the same message, both through > /usr/bin/spamc. > > The previous administrator had configured SA to run BOTH as > content_filter for Postfix AND in Procmail. [...]
> This is causing some weirdness, which we would expect. I'm seeing > messages that have two radically different scores. For instance the > header might show: > > X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.cscape.net. > X-Spam-Level: *** > X-Spam-Status: No, score=3.9 required=6.0 tests=HTML_IMAGE_ONLY_20, > HTML_IMAGE_RATIO_02,HTML_MESSAGE,HTML_SHORT_LINK_IMG_3,NO_RELAYS, > SARE_MILLIONSOF,TO_NO_BRKTS_NOTLIST,URIBL_BLACK autolearn=no > version=3.3.1 This actually is the second scanning (compare to your assumption below of the second process not modifying the headers ;). It appears, this is what happens. The first SA processing classifies a mail as spam, generates a reporting mail with the original spam attached unaltered (report_safe 1). The second SA run then examines this *report* mail generated by SA -- note the NO_RELAYS rule hit in the header above. > But the BODY shows: > > Content analysis details: (10.6 points, 5.0 required) This report is actually the first processing. Since it does have all the original headers, unlike the one above, there is no NO_RELAYS hit here. Moreover, there *are* DNSBL blacklist hits, based on the original Received headers. Again, unlike the above, where they are not available due to the original mail being attached to the report. Yet another difference between the two, resulting in different scores per rule, is the first processing (shown below) to have Bayes enabled -- site-wide Bayes DB shared amongst all users, since this is the postfix content_filter one. The second scan (see headers above) obviously is being run with Bayes disabled. > pts rule name description > ---- ---------------------- -------------------------------------------------- > 1.4 RCVD_IN_BRBL_LASTEXT RBL: RCVD_IN_BRBL_LASTEXT > [217.23.11.97 listed in bb.barracudacentral.org] > 1.7 URIBL_BLACK Contains an URL listed in the URIBL blacklist > [URIs: only-3d.com] > -0.0 SPF_HELO_PASS SPF: HELO matches SPF record > 0.3 SARE_MILLIONSOF BODY: Millions of something. > 1.5 HTML_IMAGE_ONLY_20 BODY: HTML: images with 1600-2000 bytes of words > 0.4 HTML_IMAGE_RATIO_02 BODY: HTML has a low ratio of text to image area > 0.0 HTML_MESSAGE BODY: HTML included in message > 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% > [score: 0.5000] > 0.7 MIME_HTML_ONLY BODY: Message only has text/html MIME parts > 0.1 HTML_SHORT_LINK_IMG_3 HTML is very short with a linked image > 3.5 TO_NO_BRKTS_DIRECT To: misformatted and direct-to-MX > 0.0 TO_NO_BRKTS_NOTLIST To: misformatted and not a mailing list > > A 6.7 point score discrepancy between the two. This results in mail > tagged as SPAM in the subject line (SA local.cf) making it to the > user's inbox, because the second process/higher scoring is not > modifying the message headers. The first process generated a reporting wrapper message, with the Subject rewritten -- and proper X-Spam headers in the wrapper message, reflecting the report blob in the body. The second process then overwrites the X-Spam headers. > Obviously the higher score is correct. Any suggestions on how I might > track down why the scores are different, whether to use the Procmail > or Postfix master.cf methods exclusively, etc? How to track it down? Tend to the SA users list and show exactly the data you did. ;) The tracking down part I just solved and explained above. Which one to use depends on your preferences, needs and users. Like, do you want a site-wide Bayes DB (postfix), or should your users have individual Bayes DBs (procmail)? Either one would do [1]. Given that Bayes appears to be disabled during the per-user procmail scan... A quick fix is, to just comment out the spamc call in procmail (probably keep the delivery recipes), and keep the postfix content_filter. That eliminates the erroneous second processing, and should make your system behave as expected already. From there, you can take your time to understand the processing, subtle differences, and decide what to tweak further. If any. [1] Even both would do, IFF report_safe would have been set to 0, and Subject rewriting disabled. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}