On Fri, 27 Sep 2002, Bob Proulx wrote: > Daniel Quinlan <[EMAIL PROTECTED]> [2002-09-27 20:12:53 -0700]: > > My problem with SPEWS is that it is not an accurate way to tag spam. > > There are too many FPs. > > I have heard a lot of derisive commentary about relays.osirusoft.com > but people still use them. Why? Here is some data. > > list spews as the reason. All 105 were spam. That is an average of > 21 per week, 3 a day, every day, with zero false positives. I also
You're luckier than I am. I get perhaps 1000 spams a day that are caught by spamassassin (2.20, about to upgrade to 2.4x), plus a bunch more that spamassassin doesn't trap, as I have it set up now. Of the 1000 spams that spamassassin catches, perhaps half are caught by relays.osirusoft.com, and the other half by other rules. Unfortunately, I've had the same email addresses for years, and am on tons of lists - so my addresses get harvested a lot. Of these, I usually get 1-5 false positives a day. For business reasons, throwing away false positives causes more inconvenience than wading through my "caught spam" folder - so I eyeball my "caught spam" folder manually, which turns out not that hard to do quickly using Lynx as a mail reader. I've been getting a pretty good sense of how well various rules and block lists work. Unfortunately, every time I adjust my rule file, the spammers change tactices, sigh. A couple of times a day, I wade through what spamassassin has intercepted as follows: i. a grep on "razor" selects a quarter of the messages, no false positives yet ii. a grep on osirusoft - which yields about 1/2 the messages - but.. when there's a false positive, there's a really good chance that it's in this group - and of this class of false positives, there's a close to 100% liklihood that it's SPEWS that's given the false positive - i.e., everything else that shows up in relays.osirusoft.com doesn't give false positives iii. then there's everything else > People continue to use relays.osirusoft.com because it is very > effective at tagging spam. No. People use it because it's EASY - it aggregates lots of data sources into a single query. Unfortunately, it seems that a lot of sysops don't keep track of what's getting aggregated, and don't care a lot about blocking false positives - until somebody notices that they missed an important message (hard to do) and complains about it (hard to complain about something you don't notice). I discovered all this about a year ago when the head of our local PTO sent a message to a parents' list, and didn't get her copy back. Since I provide the list for the PTO, she sent me a query - which led to discovering the wonderful world of collateral damage. > When there is something that is similarly effective that avoids the > spews problem then people will flock to it. But nothing else exists > that is similarly effective at this moment in time. I am not willing > to get an additional 21 spam messages a week when I am not seeing any > false positives myself. But then, you might not notice it unless you're looking, and you have no way of noticing when your messages don't get through at the other end! That's the insidious nature of things. > > We (SpamAssassin) need to separate out SPEWS listings so that the GA > > can assign an appropriate score (could be higher or lower, but I > > would wager that it will be lower). > > Continuing that thought, I disagree that RBLs should be used by the GA > in SA at all. That RBL data can be different at different times. I > really believe the GA should be trained on the content of the message > without RBL input. If it is deprived of the RBL information will the > GA have improved results over the content? It should. I believe the > RBL input to SA should be a manual control at a low value. Any > individual user can increase the score if they desire. Now that's the real point - and one of the reasons for using spamassassin instead of just subscribing to an RBL. The specific problem with relays.osirusoft.com is that it aggregates so much data, including a single source (spews) that has a particularly high false positive rate. The ability to fine tune, using multiple RBLs with different weights, gets past this, but at the expense of more queries. Personally, I'd sure like it if I could get, with a single query, an RBL that consisted of relays.osirusoft.com, with all spews listings deleted. Miles ************************************************************************** The Center for Civic Networking PO Box 600618 Miles R. Fidelman, President & Newtonville, MA 02460-0006 Director, Municipal Telecommunications Strategies Program 617-558-3698 fax: 617-630-8946 [EMAIL PROTECTED] http://civic.net/ccn.html Information Infrastructure: Public Spaces for the 21st Century Let's Start With: Internet Wall-Plugs Everywhere Say It Often, Say It Loud: "I Want My Internet!" ************************************************************************** ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk