Any chance of getting a run for rescoring of the SURBL lists? --Chris (Perceptron is on my list of things to master.)
>-----Original Message----- >From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] >Sent: Wednesday, September 29, 2004 1:26 PM >To: Matt Kettler >Cc: Chris Santerre; users@spamassassin.apache.org >Subject: Re: Why such a low score? > > >-----BEGIN PGP SIGNED MESSAGE----- >Hash: SHA1 > > >What Matt said ;) the perceptron really hates FPs. > >Also, another feature of the perceptron is that, if two rules >hit the same >spams and the same hams, it'll spread the scores equally >between those two >rules. > >e.g.: if RULE_1 hits a certain set of spam, it may get a score >of 3.0. But >if RULE_1 and RULE_2 both hit the same subset, the score will be spread >over the two, and each will get 1.5. > >If RULE_2 hit the same mails, but hit more hams, its score >will be reduced >and more score given to RULE_1. > >afaik... > >- --j. > >Matt Kettler writes: >> At 10:55 AM 9/29/2004, Chris Santerre wrote: >> >What was the reason WS got such a low score in SA 3.0??? .5 >is a joke! Hell >> >BigEvil was scored a 3 and now one complained, and it is >the same data!! I >> >don't understand. Did the mass check not go well? >> >> Upon closer inspection, the WS mass-check went pretty well, >but WS had the >> greatest number of nonspam hits of all the SURBL lists. It >also hit the >> most spam, but the OB list hit nearly as much spam, and >almost no nonspam. >> >> Since the GA treats FP's as 100 times worse than FNs, the GA >is going to >> heavily bias the score of any overlapping spam hits to the >one that has the >> least nonspam hits. I suspect that in the spam cases, most >of the WS hits >> also hit either OB or SC, which have better FP ratios, and >the scores >> assigned reflect this. >> >> Admittedly the amount of nonspam WS hit is small (0.4%), but >that's over 6 >> times more nonspam than OB did, and 100 times more than SC did. >> >> Thus WS got a lowish score not for being a bad rule, but for >not doing as >> well as it's neighbors that catch the same spams. >> >> From STATISTICS-set1.txt >> OVERALL% SPAM% HAM% S/O RANK SCORE NAME >> 10.497 15.8904 0.0008 1.000 0.98 2.01 URIBL_AB_SURBL >> 18.019 27.2741 0.0046 1.000 0.97 3.90 URIBL_SC_SURBL >> 49.029 74.1861 0.0654 0.999 0.74 2.00 URIBL_OB_SURBL >> 51.999 78.4712 0.4756 0.994 0.45 0.54 URIBL_WS_SURBL >> 0.010 0.0146 0.0012 0.927 0.39 0.84 URIBL_PH_SURBL >> >> From STATISTICS-set3.txt: >> OVERALL% SPAM% HAM% S/O RANK SCORE NAME >> 7.022 14.4233 0.0061 1.000 0.95 4.26 URIBL_SC_SURBL >> 30.471 62.5514 0.0632 0.999 0.74 3.21 URIBL_OB_SURBL >> 2.950 6.0208 0.0385 0.994 0.73 0.42 URIBL_AB_SURBL >> 33.807 68.9994 0.4494 0.994 0.47 1.46 URIBL_WS_SURBL >> 0.019 0.0390 0.0008 0.981 0.44 2.00 URIBL_PH_SURBL >> >> grep SURBL 50_scores.cf: >> score URIBL_AB_SURBL 0 2.007 0 0.417 >> score URIBL_OB_SURBL 0 1.996 0 3.213 >> score URIBL_PH_SURBL 0 0.839 0 2.000 >> score URIBL_SC_SURBL 0 3.897 0 4.263 >> score URIBL_WS_SURBL 0 0.539 0 1.462 >-----BEGIN PGP SIGNATURE----- >Version: GnuPG v1.2.4 (GNU/Linux) >Comment: Exmh CVS > >iD8DBQFBWvA8QTcbUG5Y7woRAkoRAJ9SxRe1x/0wID7By10Cz8uZn8v2iQCfaYAb >eVpsm+sn6fIrpwvhwwmHnmc= >=RWRo >-----END PGP SIGNATURE----- >