On Wed, Feb 27, 2002 at 05:00:29PM -0800, Craig R Hughes wrote: > Yes, the large rule scores probably do make the system more sensitive to minor > variations in input. However, they also apparently lead to more accurate > scores. It is interesting that even running unconstrained over 50,000 > generations of scores, no score ended up larger than about 20, and that one is > one which clearly is a spam-marker (PORN_1), and only a handful scored more than > +/-10. You guys are right though that we don't want to accidentally overfit by > assigning huge scores.
Here's the highest I've seen with 2.1: SPAM: Content analysis details: (43.1 hits, 5 required) SPAM: Hit! (0.7 points) From: does not include a real name SPAM: Hit! (0.6 points) Received via SMTPD32 server (SMTPD32-n.n) SPAM: Hit! (1.9 points) Subject is all capitals SPAM: Hit! (1.4 points) Subject: contains a question mark SPAM: Hit! (3.0 points) Subject contains lots of white space SPAM: Hit! (9.9 points) Received: contains a name with a faked IP-address SPAM: Hit! (2.0 points) BODY: Asks you to click below SPAM: Hit! (2.6 points) BODY: Claims you can be removed from the list SPAM: Hit! (2.0 points) BODY: Tells you to click on a URL SPAM: Hit! (4.9 points) BODY: URL of page called "remove" SPAM: Hit! (6.5 points) BODY: Link to a URL containing "remove" SPAM: Hit! (2.0 points) HTML-only mail, with no text version SPAM: Hit! (3.4 points) Forged 'by gw05' 'Received:' header found SPAM: Hit! (0.6 points) From and To the same address SPAM: Hit! (1.6 points) Subject contains a unique ID number Impressive, no? Dan. _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk