Re: spamassissin filter very broken

Jim Maul Fri, 03 Feb 2006 09:47:06 -0800

> Hello...  I am a complete newbie with Spamassassin, so I hope you will all
> bear with me.   The job of fixing our spam filter has fallen on me, as the
> person who used to handle everything relating to our mail server recently
> left my company.
>
> We're running spamassassin 3.0.2 with perl 5.8.4, and exim 3.35 on Debian.
>
> In the last 3 weeks or so , we have started receiving a ton of spam,
> especially a lot of pharmeceutical adds and some other random nonsense.
> However, Spamassassin isn't completely broken.. it is actually catching
> some spam but it's letting a lot go.   When I look at the headers in the
> spam, I see SA is assigning rediculously low scores to the ones that are
> getting through but normal scores to the ones it's stopping.    Some of
> the spam that is getting through is actually receving negative scores,
> which as I understand it shouldn't even be possible unless the spam is
> whitelisted somehow.
>
> Here are the SA related headers for a couple of spams that are getting
> through:
> X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at veritime.com
> X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on mail2
> X-Spam-Level: *
> X-Spam-Status: No, score=1.8 required=4.9 tests=BAYES_50,HTML_80_90,
>  HTML_MESSAGE,PORN_URL_SEX,RAZOR2_CF_RANGE_51_100,RAZOR2_CHECK,
>  UPPERCASE_25_50 autolearn=no version=3.0.2
> X-Spam-Veritime: Valid
>


Your bayes database appears to be misidentifying somewhat.  Most likely
trained incorrectly (autolearn or manual or maybe both).


> X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at veritime.com
> X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on mail2
> X-Spam-Level:
> X-Spam-Status: No, score=-2.6 required=4.9 tests=BAYES_00 autolearn=ham
>  version=3.0.2
> X-Spam-Veritime: Valid
>


Yes, your bayes database is very broken.  I'd wipe it out and start
fresh.. One of the problems appears to be that your autolearn thresholds
need to be adjusted.  A message scoring only BAYES_00 should NOT be
autolearned.  Ive set my autolearn threshold for ham to a small negative
number. You may want to do the same to prevent things like this from
happening.


> As opposed to this one which is a spam message that SA caught:
> X-Spam-Flag: YES
> X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on mail2
> X-Spam-Level: *************
> X-Spam-Status: Yes, score=13.2 required=4.9
> tests=BAYES_99,DIGEST_MULTIPLE,
>         DRUGS_ERECTILE,DRUG_DOSAGE,DRUG_ED_CAPS,HELO_DYNAMIC_IPADDR,
>         HTML_FONT_BIG,HTML_FONT_SIZE_LARGE,HTML_MESSAGE,HTML_SHOUTING5,
>         INVALID_DATE,PYZOR_CHECK,RAZOR2_CF_RANGE_51_100,RAZOR2_CHECK,
>         UPPERCASE_25_50 autolearn=spam version=3.0.2
> X-Spam-Veritime: Valid
> X-Spam-Veritime-Spam: True
>
>
>
> The first thing I did when I was looking at this was to run spamassassin
> -lint to see if it was parsing the configuration file properly and it gave
> me:
>
> config: SpamAssassin failed to parse line, skipping: use_terse_report
>   0
> config: SpamAssassin failed to parse line, skipping: auto_learn
>   1
> lint: 2 issues detected.  please rerun with debug enabled for more
> information.
>
> When I looked these up, I found that they were deprecated options from
> version 2.6x.   At some point SA must have been updated with a apt-get
> update or something and the configuration file was not updated along with
> it.    I'm not sure if this is the cause of the problem though, since it
> appears that SA was updated about a year ago and we've only been having
> problems with spam for the last few weeks.
>

Yeah i'd fix the syntax but i dont believe that is causing the problems
you are seeing.

> At this point i'm considering just wiping out spamassassin  and
> reinstalling it fresh and having it relearn all of the spam that we have
> been saving for a while but it seems there must be a better solution.
>

There is.  You dont need to completely wipe the SA install.  I'd just wipe
the bayes files and start learning again.  If you choose to leave
autolearning on, make sure you adjust your thresholds.  Mine are at:

bayes_auto_learn_threshold_nonspam -0.1
bayes_auto_learn_threshold_spam 10.0


> Does anybody have any ideas on what could be wrong and how I can fix this?
>  I'm being barraged by spam complaints from our users and spamassassin
> isn't trivial to jump right into without knowing something more about how
> it works.   I've been googling and looking through docs and not coming up
> with the answer.   Any help is very much appreciated.
>

Are you using all network tests?  I see some razor hits in there which is
good, but no surbl www.surbl.org or anything like that.

Most of my spams look something like:
X-Spam-Status: Yes, hits=16.1 required=5.0 tests=BAYES_99,HTML_70_80,
        HTML_MESSAGE,JP_URI_RBL,RAZOR2_CF_RANGE_51_100,RAZOR2_CHECK,
        SPAMCOP_URI_RBL autolearn=no version=2.64

X-Spam-Status: Yes, hits=29.2 required=5.0 tests=AB_URI_RBL,BAYES_99,
        CLICK_BELOW,HTML_FONTCOLOR_BLUE,HTML_FONTCOLOR_RED,      
HTML_FONTCOLOR_UNSAFE,HTML_FONT_BIG,HTML_LINK_CLICK_HERE,HTML_MESSAGE,
        JP_URI_RBL,ONLINE_PHARMACY,RAZOR2_CF_RANGE_51_100,RAZOR2_CHECK,
        SPAMCOP_URI_RBL,UPPERCASE_25_50,VIAGRA,WS_URI_RBL autolearn=no
        version=2.64

My average spam score is somewhere around 18 with a max of 42.3 (my
highest yet).  You may also want to try using some sare rules
http://www.rulesemporium.com/ to try to boost the scores.

Note: I've raised the scores of BAYES_ tests slightly on my system so that
BAYES_99 alone is enough to push a message into spam territory.  I would
not recommend doing this unless you are absolutely sure that your bayes
database is extremely accurate (and maybe not even then, but hey, its my
system ;) )

-Jim

Re: spamassissin filter very broken

Reply via email to