Hi Keith, Thanks for following the thread.
> -----Original Message----- > From: Keith C. Ivey > Sent: Tuesday, October 14, 2003 11:39 PM > To: [EMAIL PROTECTED] > Subject: RE: [SAtalk] Popcorn, Backhair, and Weeds > > > Larry Gilson <[EMAIL PROTECTED]> wrote: > > > / \w{1,7}<\/?[^<>]{0,150}>\w{1,7}/ > > > > This one seems to be working well so far. It will catch any normal > > and funky stuff within the tags but makes sure it will not run over > > any subsequent tags. > > Since '/' matches the character class '[^<>]', there's not much > point in having the '\/?' in there. Also, since you're not > looking for anything after the '\w{1,7}' at the end, you might > as well change it to '\w', since the '{1,7}' isn't making any > difference. Good point, thanks! > It seems to me that your rule is going to have a fair number of > false positives, though. For example, '<br>' often shows up > between words with no intervening whitespace, and depending on > what's used to produce the HTML I wouldn't be that surprised to > find other tags, like '<p>' or '<li>', with words on both > sides. Are you not seeing FPs? I have been seeing some FPs on the rule with <br> in some legitimate messages only where breaks should not naturally occur. Forwarded HTML messages are the most common occurance of a <br> from a line wrap. I have not been seeing FPs on <p> and <li>. I am having a difficult time trying to develop an expression that indicates I want x but not y. It would be nice to have an 'egrep -v' like function. If one exists, I just don't know it. --Larry ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. SourceForge.net hosts over 70,000 Open Source Projects. See the people who have HELPED US provide better services: Click here: http://sourceforge.net/supporters.php _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk