On Fri, 27 Sep 2002, Bob Proulx wrote:

> Daniel Quinlan <[EMAIL PROTECTED]> [2002-09-27 20:12:53 -0700]:
> > My problem with SPEWS is that it is not an accurate way to tag spam.
> > There are too many FPs.
>
> I have heard a lot of derisive commentary about relays.osirusoft.com
> but people still use them.  Why?  Here is some data.
>
> list spews as the reason.  All 105 were spam.  That is an average of
> 21 per week, 3 a day, every day, with zero false positives.  I also

You're luckier than I am.

I get perhaps 1000 spams a day that are caught by spamassassin (2.20,
about to upgrade to 2.4x), plus a bunch more that spamassassin doesn't
trap, as I have it set up now.

Of the 1000 spams that spamassassin catches, perhaps half are caught by
relays.osirusoft.com, and the other half by other rules.  Unfortunately,
I've had the same email addresses for years, and am on tons of lists - so
my addresses get harvested a lot.

Of these, I usually get 1-5 false positives a day.  For business reasons,
throwing away false positives causes more inconvenience than wading
through my "caught spam" folder - so I eyeball my "caught spam" folder
manually, which turns out not that hard to do quickly using Lynx as a mail
reader.

I've been getting a pretty good sense of how well various rules and block
lists work.  Unfortunately, every time I adjust my rule file, the spammers
change tactices, sigh.

A couple of times a day, I wade through what spamassassin has
intercepted as follows:

i. a grep on "razor" selects a quarter of the messages, no false
positives yet

ii. a grep on osirusoft - which yields about 1/2 the messages -
but.. when there's a false positive, there's a really good chance that
it's in this group - and of this class of false positives, there's a close
to 100% liklihood that it's SPEWS that's given the false positive - i.e.,
everything else that shows up in relays.osirusoft.com doesn't give false
positives

iii. then there's everything else

> People continue to use relays.osirusoft.com because it is very
> effective at tagging spam.

No. People use it because it's EASY - it aggregates lots of data sources
into a single query.  Unfortunately, it seems that a lot of sysops don't
keep track of what's getting aggregated, and don't care a lot about
blocking false positives - until somebody notices that they missed an
important message (hard to do) and complains about it (hard to complain
about something you don't notice).

I discovered all this about a year ago when the head of our local PTO sent
a message to a parents' list, and didn't get her copy back.  Since I
provide the list for the PTO, she sent me a query - which led to
discovering the wonderful world of collateral damage.

> When there is something that is similarly effective that avoids the
> spews problem then people will flock to it. But nothing else exists
> that is similarly effective at this moment in time.  I am not willing
> to get an additional 21 spam messages a week when I am not seeing any
> false positives myself.

But then, you might not notice it unless you're looking, and you have no
way of noticing when your messages don't get through at the other end!
That's the insidious nature of things.

> > We (SpamAssassin) need to separate out SPEWS listings so that the GA
> > can assign an appropriate score (could be higher or lower, but I
> > would wager that it will be lower).
>
> Continuing that thought, I disagree that RBLs should be used by the GA
> in SA at all.  That RBL data can be different at different times.  I
> really believe the GA should be trained on the content of the message
> without RBL input.  If it is deprived of the RBL information will the
> GA have improved results over the content?  It should.  I believe the
> RBL input to SA should be a manual control at a low value.  Any
> individual user can increase the score if they desire.

Now that's the real point - and one of the reasons for using spamassassin
instead of just subscribing to an RBL.

The specific problem with relays.osirusoft.com is that it aggregates so
much data, including a single source (spews) that has a particularly high
false positive rate.  The ability to fine tune, using multiple RBLs with
different weights, gets past this, but at the expense of more queries.
Personally, I'd sure like it if I could get, with a single query, an RBL
that consisted of relays.osirusoft.com, with all spews listings deleted.

Miles


**************************************************************************
The Center for Civic Networking             PO Box 600618
Miles R. Fidelman, President &              Newtonville, MA 02460-0006
Director, Municipal Telecommunications
Strategies Program                          617-558-3698 fax: 617-630-8946
[EMAIL PROTECTED]                      http://civic.net/ccn.html

Information Infrastructure: Public Spaces for the 21st Century
Let's Start With: Internet Wall-Plugs Everywhere
Say It Often, Say It Loud: "I Want My Internet!"
**************************************************************************



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to