-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello maarten,
Friday, November 7, 2003, 1:25:05 PM, you wrote: mvdB> ... Upon looking at those rules I see al LOT of inconsistencies. mvdB> For instance, I found these rules that have score of zero(!) (and mvdB> these are merely the top of a large iceberg) mvdB> score CASHCASHCASH 0 mvdB> score ADDRESSES_ON_CD 0 mvdB> score BLANK_LINES_90_100 0 mvdB> score EJACULATION 0 mvdB> score HERBAL_V+AG+A 0 mvdB> One could argue that yelling CASH CASH CASH is a valid sales pitch mvdB> in a normal mail. But hey, are we being realistic here ? How could mvdB> anything but spam have this property ? In my personal corpus, the CASHCASHCASH rule matches * A personal response from SBCIS concerning spam abuse * An email within a non-profit organizations internal mailing list concerning the cost of its annual convention * A promotional email I receive as a member of a hotel's loyalty program and 200 spam. Personally, I've changed the score for this rule, to 0.75 (of 9.0). mvdB> least some low figure but NOT equal zero... mvdB> And... well I won't even go into the fifth rule... come on ;-) This rule seems to have matched no ham in my corpus. I'm also curious, though, where you got the 0 score from. On my system this rule scores 1.6. mvdB> Well, I'll grant you that much although I did study it a fair mvdB> amount. But let's look at another aspect here too. There is not a mvdB> single rule that scores higher than 4.999. That is plain wrong in mvdB> my book; ... Me, I do not want any distributed rule to flag something as spam. Most of my rules that I develop and add to my own system are limited to 1/3 of my spam threshold. I strongly agree with the developers that spam is not identified by a single rule, but instead by a combination of characteristics, verified through a combination of rules. There are three exceptions: * emails sent to a completely invalid email address are always spam. There is no such address here as [EMAIL PROTECTED], and so any email sent to that address gets a "spam without a doubt" score. * emails sent from known spamming organizations. That's what the distributed blacklists are for (thanks again, William Stearns). * emails which contain URI links to sites that do nothing but spam. Blacklists are automatically scored 100. The other two I will personally score anywhere from 4.5 (half my threshold) to 50 (5x my threshold). But the point is that *I* want to make this determination. I don't want to trust anyone else's corpus to do this for me. (I even have a list of blacklist addresses on William Stearns' public list which I remove each and every time I update my download, since those are not considered spam in my domain.) mvdB> Not wanting to be a PITA ;-), I would almost start questioning the mvdB> statistics file cause it seems not to reflect real-life situations. mvdB> But hey, who am I ? Not one of the mass check contributors yet, I can tell. :-) Stick around. Learn how to use the masscheck capabilities (see the masses directory within the SA distribution). Each time we move from one major distribution to another (eg: 2.6x to 2.70) there's a mass check scoring round, and you can help by testing the new ruleset against YOUR corpus. mvdB> Of course. I know. The reason I started writing this in the first mvdB> place is just _because_ I see so many messages that are SO full of mvdB> spam signs, yet invariably score 4.90... And thus, they fall right mvdB> through... :-(( Head for the Rules Emporium, or for the Wiki, and you'll see how many of us beef up our own ruleset. You'll have those 4.90's scoring 14.90 without much delay. Bob Menschel -----BEGIN PGP SIGNATURE----- Version: PGP 8.0 iQA/AwUBP6xZpZebK8E4qh1HEQItigCg/0WnfyUY0VGwQRFB218iMg9QTT0AmQGP MhW7jBDhIpuuQe4yitQ+zox5 =p+5z -----END PGP SIGNATURE----- ------------------------------------------------------- This SF.Net email sponsored by: ApacheCon 2003, 16-19 November in Las Vegas. Learn firsthand the latest developments in Apache, PHP, Perl, XML, Java, MySQL, WebDAV, and more! http://www.apachecon.com/ _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
