In the corpus, LINE_OF_YELLING appears almost 9000 times in spam, and about 1300 
times in nonspam.  So I'm guessing that when it's in the nonspam, there are 
other telltales that it's not really spam, and those rules have been assigned 
-ve scores by the GA.  There are only 562 false positives from the whole corpus, 
so at least 800 or so LINE_OF_YELLING nonspam messages made it through OK.  This 
does raise an interesting issue though, which is to maybe take a look at the 
false positives and false negatives after a GA run, and see which rules are the 
most commonly triggered in each set -- maybe 500 of those 562 false positives 
are triggering on LINE_OF_YELLING or something (though them I would imagine the 
GA would get a little smarter about scoring it so high).  I'll take a look at 
that though.

C

Daniel Rogers wrote:

> Date: Wed, 27 Feb 2002 11:35:03 -0800
> From: Daniel Rogers <[EMAIL PROTECTED]>
> To: [EMAIL PROTECTED]
> Subject: [SAtalk] LINE_OF_YELLING
> 
> LINE_OF_YELLING seems to have jumped from a score of 0.70 in SA 2.01 to a
> score of 5.442 in SA 2.1.  This strikes me as rather a lot.  Aren't there
> still people who still write their messages all in caps because they don't
> know any better?
> 
> Also, any mail that uses a line of all caps as a title (such as NTK) would
> get immediately marked as spam.
> 
> Dan.
> 
> _______________________________________________
> Spamassassin-talk mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
> 
> 
> 


_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to