Hello Pierre, Saturday, January 17, 2004, 6:28:37 PM, you wrote:
PT> Bob, PT> Thanks for the mass check. I don't have a big corpus handy, PT> just what trickles through the gateway. PT> There should be no problem with a few extra keywords; we could PT> even squeeze "postmaster" in there for good measure, though rules PT> which line-wrap sometimes cause grief for text downloads. >>Also, the valid HTML tags are valid regardless of case, eg: >><BLOCKQUOTE> is a valid HTML tag, but excluded in your rule. PT> My original test only looks for lowercase strings, so there is PT> no need to make exceptions for valid uppercase tags. So far I have PT> only seen lowercase bogus tags in spam. How does an overall /i PT> modifier affect inverse matches anyhow? Will your version match PT> <HAIRSPRAY> and not match <BLOCKQUOTE> ? Good point. I've removed the /i from my copy. Results: rawbody PT_BOGUS_HTML /\<\/?(?!(?:blockquote|optiongroup|plaintext|fontfamily|underline))[a-z]{9,15}\>/ describe PT_BOGUS_HTML random long words disguised as HTML tags score PT_BOGUS_HTML 4.000 # 9628s/2h of 92209 corpus (74874s/17335h) 01/17/04 Bob Menschel ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk