> >> > I've been using 2.54 for a while, and I'm now in the process of >> > upgrading to 2.55. I am using the default set of rules. >> > >> > Over the past several weeks, I've noticed an increasing amount of >> > spam that is getting through SpamAssassin with scores in the 4.0-4.9 >> > range. This makes me wonder if perhaps some spammers have started to >> > taylor their spams as follows: run the default version of >> > SpamAssassin, feed their messages through it, and keep tweaking the >> > messages until SpamAssassin lets them through. >> > >> > Does anyone else think that this could be possible? It seems to me >> > that this process could be easily automated, or at least >> > semi-automated. >> > If indeed this is going on, what can we do about it? >> >> Train bayes. Everyone has a different bayes db, and they can't >> work around that centrally. > > The problem I'm seeing is that I'm getting messages with a Bayes of 90% > but it still slips through with 4.5-5. > > But, keep it in proportion. I'm still trapping over 98%.
I don't think theres anything sinister going on, simply that as new spam techniques evolve, SA is falling behind, until the next release comes out. A bit like a virus scanner gradually getting out of date before getting the latest updates.. I see some spams like those you mention with scores between 3 and 5 (our default threshold is 7) and from looking at the tests it is apparent that only "generic" tests are being triggered on such spams, such as the HTML font size tests, sometimes RBL tests, and sometimes BAYES, but no tests specific to that kind of spam. The good thing about this game of catchup however, is that even if spammers are modifying their techniques to try and get around SA, they're slowly painting themselves into a corner in regards to the kind of things they can say and put in messages without being detected as spam. The only kinds of spam that I see as fundamentally problematic for any kind of scanner like SA are: * URL only messages, like some of the current porn ones, that don't say anything suspicious that you can reliably trigger on in the message, (or say anything at all) but have a URL to a dubious site - perhaps checking the URL against an RBL list that lists websites referenced from spam is the answer here, rather than every copy of SA trying to read the site and analyze content from it. * Image only Spams (text in the images) - a very tough one to crack, and perhaps impossible to solve, although if the images are loaded from an image server somewhere, the same RBL technique mentioned above could be aplied - spams with image tags going to servers listed in the above RBL of "dubious sites" would also score highly. Perhaps if no RBL exists of sites referenced in links and image tags from spam someone could start one up ? Thoughts anyone ? Regards, Simon ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk