Karl Auer wrote:
On Tue, 2006-11-14 at 09:58 -0500, Peter H. Lemieux wrote:
< body __HAS_PENETRATION /\bpenetration\b/i
I think a lot of rules would be better for losing the word boundaries.
Very few of the worst "four letter words", are ever legitimate
substrings, either.
I generally agree, Karl. In this particular instance I was suggesting a
patch to the 70_sare_adult ruleset and was following the patterns the
maintainer used for similar rules.
OTOH, I've had FP problems with simple word searches that don't include
word boundaries. A word like "sex" can match "sextuplets" or
"Middlesex". (The latter case brought this quickly to my attention some
years ago when I first starting writing my own SA rules. Middlesex is a
county here in Massachusetts.) It's often hard to imagine all the
possible false positives that might arise from a particular string, so I
can understand why the publicly-distributed rulesets like those from SARE
are so careful about word boundaries.
Peter