"Tony L. Svanstrom" <[EMAIL PROTECTED]> writes: > It's an "agree to disagree"-situation, methinks;
You could completely right and the right course of action would be to remove all negative scoring rules. Further, I care less about being right than I do about improving SA. My goal here isn't really to convince you that you're wrong, but to explain my thinking and some of the general direction of SA development in this area (not that I speak for the other developers, of course). > where I claim that the negative scores will hit unevenly (hugely > benefiting some mailclients, indirectly very slightly hurting those of > unknown/less known mailers) The percentage of legitimate clients that we hit is very large. However, most of the MUA rules are there as prerequisites to detect forgery, not as compensation (negative scoring) rules. Also, I think you are excessively focusing on the mail client rules. To be perfectly honest, I suspect those specific rules should probably go because they are too easy to forge. However, there are many other negative rules. Some of the negative rules are needed to compensate for certain bad behaviors found in specific legitimate clients, otherwise we would lose some of our otherwise effective spam-detection rules. For example, we have a negative-scoring rule for Evites. Also, some negative rules that are nearly foolproof and a few planned ones that will be even better (like Message-ID tracking). There's no reason to get rid of all negative rules which is what you're claiming is a good idea -- with no basis, but lots of rhetoric. I have data that shows these rules work well. For example, OVERALL% SPAM% HAM% S/O RANK SCORE NAME 66734 30802 35932 0.462 0.00 0.00 (all messages) 100.000 46.1564 53.8436 0.462 0.00 0.00 (all messages as %) 11.087 0.0065 20.5861 0.000 0.99 -6.60 REFERENCES (Corpus data from theo, rODbegbie, and myself.) Supplement the REFERENCES one with a Message-ID tracking REFERENCE_SEEN test, then it's even better. I suspect the attribution/quote rules will probably get phased out at some point. Also, I think the negative scores are too low, but again, we're already talking about fixing that for 2.5x and 2.60. > and at the same time allow the more clever spammers to bypass SA (as > well as those less clueful spammers that just follow the current > trends), while the local gods (meant in a very friendly and nice way) > claim that at the bottomline the good outweights the bad. > Sure, I could try to prove it, but then I'd need to set up an as close > to as possible identical environment to that of the SA-developers; and > unless the resulting data would show anything hugely different the > data would just be discared as not proving anything relevant. Not really. Yes, you need a good corpus and you need to use statistics rather than not guesses, but we don't have a monopoly on those things. We can't just make changes because you *believe* some statistical fact is true without actually having the statistical data. > The second problem would be that I claim that it will even things out > for most people, which results in that even if I'm right the > endresults might be very very close to that of a "standard" SA if you > just run it with a large enough corpus... If the results are going to be the same, then why should anyone care? Maybe it would save me some time discussing the issue, but that's just speculation. Even more people might complain on the other side of the issue if we made that change for no reason. :-) > [...] >> Finally, does anyone have evidence that our FN rates are going up from >> release to release? Mine certainly aren't. (Objective results matter.) > I think it's a combination of two things already talked about on satalk: > > #1: An increase in spamtraffic. > #2: More and more ISPs using spamfilters. > > The result is that to some people it will feel like they're getting > the same number of spam, but more FPs; a situation caused by ISP > already having removed the easiest to spot spams. When ISPs use spamfilters (and then the user runs SpamAssassin), it's the SpamAssassin FN rates, not FP rates, that skyrocket. Spam traffic is higher, which is why I still receive about one uncaught spam per day, but I get 2-3x as much spam as a year ago. But, I agree, that these are factors that affect user perception of our performance negatively. I guess when you're close to perfect, then people will start demanding perfection. ;-) > Personally I'd also like to think that it's becaused people are being > hit by different groups of spammers, which are more or less clever > when it comes to designing their spam... Something that would to some > extent support my claim that negative scores should be avoided. To paraphrase Craig: we don't have to catch spam from every spammer. We just have to catch spam from most (or the average) spammers. The cost of catching 100.0% of spam is a significant number of false positives because *some* spam will always manage to look sufficiently similar to ham. > That last part is pure guessing and, IMNSHO, borderline trollish. =) I think it's the earlier part of your message that was the troll, actually. You can't win an argument by claiming it's unwinnable and therefore you must right. ;-) Daniel -- Daniel Quinlan anti-spam (SpamAssassin), Linux, and open http://www.pathname.com/~quinlan/ source consulting (looking for new work) ------------------------------------------------------- This SF.net email is sponsored by: Does your code think in ink? You could win a Tablet PC. Get a free Tablet PC hat just for playing. What are you waiting for? http://ads.sourceforge.net/cgi-bin/redirect.pl?micr5043en _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk