Daniel Quinlan <[EMAIL PROTECTED]> [2002-09-27 20:12:53 -0700]: > My problem with SPEWS is that it is not an accurate way to tag spam. > There are too many FPs.
I have heard a lot of derisive commentary about relays.osirusoft.com but people still use them. Why? Here is some data. I just went through the last 5 weeks of spam trash that were directed to me at my address. I had 105 hits on relays.osirusoft.com which list spews as the reason. All 105 were spam. That is an average of 21 per week, 3 a day, every day, with zero false positives. I also had 188 from relays.osirusoft.com which listed reasons other than spews. That makes 293 total messages that relays.osirusoft.com correctly listed and zero false positives over the course of five weeks. YMMV definitely in this case but this is the data from my site. For the record I also understand the collateral damage problem and am also of similar mind to Matt and Dan and a few of the others. People continue to use relays.osirusoft.com because it is very effective at tagging spam. When there is something that is similarly effective that avoids the spews problem then people will flock to it. But nothing else exists that is similarly effective at this moment in time. I am not willing to get an additional 21 spam messages a week when I am not seeing any false positives myself. > We (SpamAssassin) need to separate out SPEWS listings so that the GA > can assign an appropriate score (could be higher or lower, but I > would wager that it will be lower). Continuing that thought, I disagree that RBLs should be used by the GA in SA at all. That RBL data can be different at different times. I really believe the GA should be trained on the content of the message without RBL input. If it is deprived of the RBL information will the GA have improved results over the content? It should. I believe the RBL input to SA should be a manual control at a low value. Any individual user can increase the score if they desire. The strenth of SA is to look at the content and catagorize the message based upon the content with no _single_ score having too high of a lever on the result. I would like to see SA continue to push in that area and to avoid the distraction of using the RBL information in the GA generated scores. RBL data should be a manually scored input to SA but not an overwhelmingly large input. If it really is spam then the content will drive the score over the threshold. Bob
Description: PGP signature