Hello mort+spamassassin,

Tuesday, September 7, 2004, 10:32:20 AM, you wrote:

> In message <[EMAIL PROTECTED]>, Robert
Menschel writes:
>>LW> header SUB_UNDERSCORES    Subject =~ /__/
>>LW> score    SUB_UNDERSCORES    0.1
>>LW> But don't use it, or at least not with any significant score.

>>Well, actually, a quick scan of my corpus, 24k ham and 46k spam, shows 40
>>spam hits and no ham hits. IMO that could warrant a SARE score as high as
>>0.777 (my email client often gives different results than mass-check
>>does, so don't take this as gospel). Expect to see this in my next SARE
>>mass-check request, so we can see if it works on other corpora.

> I would advice against it. At least one big free email provider
> (yahoo.se, not sure about the rest of yahoo) will produce this kind of
> subject when you send quoted-printable encoded headers to and from it,
> due to a buggy QP-encoding.

Can you send me one or two examples of this for my corpus (with full
headers)? As mentioned above, the rule has done well within SARE's
testing, 
> header    SARE_SUB_2UNDERSCORES    Subject =~ /__/
> describe  SARE_SUB_2UNDERSCORES    Subject contains consecutive underscores
> score     SARE_SUB_2UNDERSCORES    0.652
> #hist     SARE_SUB_2UNDERSCORES    Loren Wilton in response to SA-Users query 
> Aug 26 2004
> #counts   SARE_SUB_2UNDERSCORES    31s/0h of 64199 corpus (39383s/24816h RM) 
> 08/28/04
> #counts   SARE_SUB_2UNDERSCORES    13s/0h of 18651 corpus (16120s/2531h MY) 
> 08/29/04
> #counts   SARE_SUB_2UNDERSCORES    8s/2h of 38751 corpus (15270s/23481h 
> JH-SA3.0rc1) 08/30/04
That's only 2 ham vs 52 spam. If you have more counter-examples we'd like
to include them in our scoring algorithm, to help avoid FPs.

Thanks.

Bob Menschel



Reply via email to