> -----Original Message----- > From: Ned Slider [mailto:[EMAIL PROTECTED] > Sent: 1 October 2008 12:15 p.m. > To: users@spamassassin.apache.org > Subject: Re: False Positive on SUBJECT_FUZZY_TION rule > > Ned Slider wrote: > > Hi List, > > > > I'm getting some FP hits against the SUBJECT_FUZZY_TION rule in > > 25_replace.cf (SA 3.2.5, latest update): > > > > > > header SUBJECT_FUZZY_TION Subject =~ /<post > P3>(?!tion)<T><I><O><N>/i > > describe SUBJECT_FUZZY_TION Attempt to obfuscate words in Subject: > > replace_rules SUBJECT_FUZZY_TION > > > > > > is hitting on ham from a mailing list with the following subject line: > > > > Subject: Re: [CentOS] mount UFS partition on CentOS 5. > > > > My regex isn't good enough to understand exactly what this rule is > > trying to achieve, but it looks to me like some kind of obfuscation of > > "tion" within a word, but it appears to be hitting on "partition" in > > this case to my untrained eye. A test email containing just the text > > "partition" in the subject line also hits this rule so would appear to > > confirm my assumptions. > > > > Could anyone help me understand what this rule is designed to hit, and > > why it's hitting in this case? > > > > Thanks. > > > > > Replying to my own thread... > > I'm assuming this rule is interpreting "tition" as an obfuscation of > "tion" hence why it hits against "partition" as if it were an > obfuscation of "partion". > > Looking at some very crude stats for this rule against a recent corpus > of ~1700 ham and ~1800 spam on my server, I see 13 FP hits against ham > and only 1 hit against spam (an obfuscation of erection). Admittedly my > ham corpus was a technical mailing list likely to contain the term > "partition" given it's common usage within IT and triggering of the rule > in no way got close to tagging any ham as spam. > > Anyway, to me this rule doesn't appear to represent good value so I'll > probably just adjust the score to 0.001 and monitor it unless someone > can suggest a method to prevent it hitting against legitimate words such > as partition.
Hello Ned. Lowering the score to something that will not be relevant at total score time is a good idea for testing any rules. As you've done a corpus test, and proven that it hits more Ham than Spam (by a significant figure) this proves the rule doesn't really work for your site. If it were my site, I'd disable the rule based on the corpus test. Cheers, Mike