Daniel Quinlan <[EMAIL PROTECTED]> [2002-09-27 20:12:53 -0700]:
> My problem with SPEWS is that it is not an accurate way to tag spam.
> There are too many FPs.

I have heard a lot of derisive commentary about relays.osirusoft.com
but people still use them.  Why?  Here is some data.

I just went through the last 5 weeks of spam trash that were directed
to me at my address.  I had 105 hits on relays.osirusoft.com which
list spews as the reason.  All 105 were spam.  That is an average of
21 per week, 3 a day, every day, with zero false positives.  I also
had 188 from relays.osirusoft.com which listed reasons other than
spews.  That makes 293 total messages that relays.osirusoft.com
correctly listed and zero false positives over the course of five
weeks.  YMMV definitely in this case but this is the data from my
site.

For the record I also understand the collateral damage problem and am
also of similar mind to Matt and Dan and a few of the others.

People continue to use relays.osirusoft.com because it is very
effective at tagging spam.  When there is something that is similarly
effective that avoids the spews problem then people will flock to it.
But nothing else exists that is similarly effective at this moment in
time.  I am not willing to get an additional 21 spam messages a week
when I am not seeing any false positives myself.

> We (SpamAssassin) need to separate out SPEWS listings so that the GA
> can assign an appropriate score (could be higher or lower, but I
> would wager that it will be lower).

Continuing that thought, I disagree that RBLs should be used by the GA
in SA at all.  That RBL data can be different at different times.  I
really believe the GA should be trained on the content of the message
without RBL input.  If it is deprived of the RBL information will the
GA have improved results over the content?  It should.  I believe the
RBL input to SA should be a manual control at a low value.  Any
individual user can increase the score if they desire.

The strenth of SA is to look at the content and catagorize the message
based upon the content with no _single_ score having too high of a
lever on the result.  I would like to see SA continue to push in that
area and to avoid the distraction of using the RBL information in the
GA generated scores.  RBL data should be a manually scored input to SA
but not an overwhelmingly large input.  If it really is spam then the
content will drive the score over the threshold.

Bob

Attachment: msg08221/pgp00000.pgp
Description: PGP signature

Reply via email to