Hello mort+spamassassin, Tuesday, September 7, 2004, 10:32:20 AM, you wrote:
> In message <[EMAIL PROTECTED]>, Robert Menschel writes: >>LW> header SUB_UNDERSCORES Subject =~ /__/ >>LW> score SUB_UNDERSCORES 0.1 >>LW> But don't use it, or at least not with any significant score. >>Well, actually, a quick scan of my corpus, 24k ham and 46k spam, shows 40 >>spam hits and no ham hits. IMO that could warrant a SARE score as high as >>0.777 (my email client often gives different results than mass-check >>does, so don't take this as gospel). Expect to see this in my next SARE >>mass-check request, so we can see if it works on other corpora. > I would advice against it. At least one big free email provider > (yahoo.se, not sure about the rest of yahoo) will produce this kind of > subject when you send quoted-printable encoded headers to and from it, > due to a buggy QP-encoding. Can you send me one or two examples of this for my corpus (with full headers)? As mentioned above, the rule has done well within SARE's testing, > header SARE_SUB_2UNDERSCORES Subject =~ /__/ > describe SARE_SUB_2UNDERSCORES Subject contains consecutive underscores > score SARE_SUB_2UNDERSCORES 0.652 > #hist SARE_SUB_2UNDERSCORES Loren Wilton in response to SA-Users query > Aug 26 2004 > #counts SARE_SUB_2UNDERSCORES 31s/0h of 64199 corpus (39383s/24816h RM) > 08/28/04 > #counts SARE_SUB_2UNDERSCORES 13s/0h of 18651 corpus (16120s/2531h MY) > 08/29/04 > #counts SARE_SUB_2UNDERSCORES 8s/2h of 38751 corpus (15270s/23481h > JH-SA3.0rc1) 08/30/04 That's only 2 ham vs 52 spam. If you have more counter-examples we'd like to include them in our scoring algorithm, to help avoid FPs. Thanks. Bob Menschel