On Wed, Feb 27, 2002 at 05:00:29PM -0800, Craig R Hughes wrote:
> Yes, the large rule scores probably do make the system more sensitive to minor 
> variations in input.  However, they also apparently lead to more accurate 
> scores.  It is interesting that even running unconstrained over 50,000 
> generations of scores, no score ended up larger than about 20, and that one is 
> one which clearly is a spam-marker (PORN_1), and only a handful scored more than 
> +/-10.  You guys are right though that we don't want to accidentally overfit by 
> assigning huge scores.

Here's the highest I've seen with 2.1:

SPAM: Content analysis details:   (43.1 hits, 5 required)
SPAM: Hit! (0.7 points)  From: does not include a real name
SPAM: Hit! (0.6 points)  Received via SMTPD32 server (SMTPD32-n.n)
SPAM: Hit! (1.9 points)  Subject is all capitals
SPAM: Hit! (1.4 points)  Subject: contains a question mark
SPAM: Hit! (3.0 points)  Subject contains lots of white space
SPAM: Hit! (9.9 points)  Received: contains a name with a faked IP-address
SPAM: Hit! (2.0 points)  BODY: Asks you to click below
SPAM: Hit! (2.6 points)  BODY: Claims you can be removed from the list
SPAM: Hit! (2.0 points)  BODY: Tells you to click on a URL
SPAM: Hit! (4.9 points)  BODY: URL of page called "remove"
SPAM: Hit! (6.5 points)  BODY: Link to a URL containing "remove"
SPAM: Hit! (2.0 points)  HTML-only mail, with no text version
SPAM: Hit! (3.4 points)  Forged 'by gw05' 'Received:' header found
SPAM: Hit! (0.6 points)  From and To the same address
SPAM: Hit! (1.6 points)  Subject contains a unique ID number

Impressive, no?

Dan.

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to