On Sat, 2009-10-03 at 00:25 +0200, mouss wrote: > Karsten Bräckelmann wrote:
> > > > False positive. Something, that matches (positive) the criterion for a > > > > certain test, but should not (false). > > > > I stand to what I said. > > I'm not surprised:) ;) > > IFF you are talking about the black box that spam detection is, that is > > true. > > > > If you are talking about a rule like FORGED_MUA_OUTLOOK, it appears to > > be that simple. However, it is not. You are looking at a single test, > > which -- if positive -- either is correct or wrong. > > I understand the rationale, but I find this too abstract for "common" > discussions. *shrug* You're not obliged to participate in a thread, if it is confusing to you. That's the wonders of open discussion and diverse input. You might stumble upon something you didn't know before... ;) > > Same for RCVD_IN_DNSWL. If it positively matches, it either it is > > correct, or wrong. A false positive is a match, that is wrong. No matter > > the score you assign the test. > > except that it depends what the test really means. dnswl doesn't mean > the listed hosts never send spam. I am happy that it lists debian list > servers, Orange, ... etc. Exactly, in the context of a single rule (as opposed to "detecting spam"), it depends on what the rule really means. Or in short, its score's sign... > > This concept is NOT specific to spam detection, or even computer > > science. As a matter of fact, when I first really grasped the concept, a > > medical scientist explained it to me. > > now that you say it, this is true. I too believ that medical science has > precedence in this area. > > > Yes, a FP for a rule that identifies *ham* actually evaluated positive > > on a spam. It only appears to be spam centric on this list, cause it is > > mainly dedicated to identifying spam, not ham. > > > > You might want to ask wikipedia as well. And don't focus on the spam > > filtering *example*, which again exclusively talks about a rule > > identifying spam. Not ham. > > my point was that in a spam oriented forum, the meaning of some words is > what "most of us" (yes, this is hard to define) think they mean. the > principle of least astonishment. Of course, these terms mostly come up WRT to overall score of a message, which applies to "detecting spam". However, on this very list, it also commonly is referred to single rules FP'ing, *without* pushing the ham above the required_score threshold. The only aspect new and obviously confusing to some regulars on this list is the negative sign of the rule's score. Inverting the "is spam" test logic also inverts the meaning of F[PN]. Whether one likes this or not. It's all about context. And FWIW, it is wrong to base your definitions on what the majority thinks is correct. The majority and what's believed to be "common knowledge" too often is wrong. You can observe this in real life, too... I prefer to educate the masses instead. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}