At 11:16 PM 7/31/03 -0400, Larry Gilson wrote:

Have these statistics been fairly true over time or do they fluctuate?  It
seems that the statistics are point-in-time numbers that tell more about the
reported spam rather than the effectiveness of the specific check.  While
the corpus of messages fed to create these statistics may be the same for
all three tests, the corpus of checksums for each check are different as the
pieces of and number of reported spam will be different.  Is this true?  I
hope I articulated my thought correctly.  If my train of thought is correct,
then it would be beneficial to run all three tests rather than just one or
two.  Thoughts?

Certainly they fluctuate a bit. However, the corpus is a realistic pile of fresh, real-world spam, and the checks are run against the real-world public razor, dcc and pyzor servers. They aren't run against a private database or in any kind of artificial frictionless vacuum. They're also based on an pretty large corpus (192,687 messages, 53% of which are spam). So those statistics are going to be fairly representative of real-world numbers.


As for having different results, certainly there's going to be a fair amount of non-overlapping messages (ie: something only in pyzor, and not razor, etc). So there is some benefit to running them all... there's also some drawback.. each of these is a network test, so the more of these you run, the slower spamassassin goes. These systems are generally more CPU intensive and slower than DNSBLs (excluding outages).

Also you might want to look at policies.. DCC has always been questionable to me for use in SpamAssassin. DCC is explicitly NOT a spam database. It IS explicitly a bulk mail database. So it is perfectly acceptable for DCC to report subscriber-only newsletters to DCC. A lot of people have had problems when using it because it does flag off some legit mass-mails. I'd certainly not trust it for over 1.0 myself. The fact that DCC doesn't do poorly in the corpus makes me wonder if the corpus is mostly devoid of subscriptions to things like motley fool, etc.

That said, I've often had problems with razor2, because some dimwit in the world keeps reporting the Versalogic newsletter, presumably because they decided to start using an old email address as a spamtrap. (Versalogic is an embedded computer hardware mfg, which has an opt-in only newsletter that I've never had trouble unsubscribing from when I wanted to. However, it only fires off about 6 mails a year).










------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to