> -----Original Message-----
> From: Ned Slider [mailto:[EMAIL PROTECTED]
> Sent: 1 October 2008 12:15 p.m.
> To: users@spamassassin.apache.org
> Subject: Re: False Positive on SUBJECT_FUZZY_TION rule
> 
> Ned Slider wrote:
> > Hi List,
> >
> > I'm getting some FP hits against the SUBJECT_FUZZY_TION rule in
> > 25_replace.cf (SA 3.2.5, latest update):
> >
> >
> > header SUBJECT_FUZZY_TION       Subject =~ /<post
> P3>(?!tion)<T><I><O><N>/i
> > describe SUBJECT_FUZZY_TION     Attempt to obfuscate words in
Subject:
> > replace_rules SUBJECT_FUZZY_TION
> >
> >
> > is hitting on ham from a mailing list with the following subject
line:
> >
> > Subject: Re: [CentOS] mount UFS partition on CentOS 5.
> >
> > My regex isn't good enough to understand exactly what this rule is
> > trying to achieve, but it looks to me like some kind of obfuscation
of
> > "tion" within a word, but it appears to be hitting on "partition" in
> > this case to my untrained eye. A test email containing just the text
> > "partition" in the subject line also hits this rule so would appear
to
> > confirm my assumptions.
> >
> > Could anyone help me understand what this rule is designed to hit,
and
> > why it's hitting in this case?
> >
> > Thanks.
> >
> 
> 
> Replying to my own thread...
> 
> I'm assuming this rule is interpreting "tition" as an obfuscation of
> "tion" hence why it hits against "partition" as if it were an
> obfuscation of "partion".
> 
> Looking at some very crude stats for this rule against a recent corpus
> of ~1700 ham and ~1800 spam on my server, I see 13 FP hits against ham
> and only 1 hit against spam (an obfuscation of erection). Admittedly
my
> ham corpus was a technical mailing list likely to contain the term
> "partition" given it's common usage within IT and triggering of the
rule
> in no way got close to tagging any ham as spam.
> 
> Anyway, to me this rule doesn't appear to represent good value so I'll
> probably just adjust the score to 0.001 and monitor it unless someone
> can suggest a method to prevent it hitting against legitimate words
such
> as partition.

Hello Ned.

Lowering the score to something that will not be relevant at total score
time is a good idea for testing any rules. As you've done a corpus test,
and proven that it hits more Ham than Spam (by a significant figure)
this proves the rule doesn't really work for your site. If it were my
site, I'd disable the rule based on the corpus test. 

Cheers,
Mike

Reply via email to