I made a rule that catches many of these bogus HTML tags, based on the fact that there are only three valid standalone tags of 9 characters or more (according to the list at http://devedge.netscape.com/library/xref/2001/html-element/ ):
# check for invalid HTML tags of 9 characters or more rawbody PT_BOGUS_HTML /\<\/?(?!(?:blockquote|optiongroup|plaintext))[a-z]{9,15}\>/ describe PT_BOGUS_HTML random long words disguised as HTML tags score PT_BOGUS_HTML 1.0 Of course, it's possible that someone would put a long word in angle brackets in a legit email; it would be better to have a rule set that looks for multiple instances of this pattern. You can make it stricter by removing the first ? in the regexp; then only "closing" HTML tags will be matched. As always, YMMV; test new rules before using in production. Does anyone have a better test for this? Pierre Thomson BIC -----Original Message----- From: Christian Recktenwald <spamassassin-talk-dist <at> citecs.de> Subject: Filter rule f. invalid HTML tags? Date: Mon, 05 Jan 2004 11:59:05 +0100 Hi, I've recognized a lot of invalid HTML tags in several spam messages. According to w3.org there are 92 valid HTML tags defined for HTML 4.01. As far as I can see, such crud is not recognized by sa. How about a rule looking for invalid html tags? -- Christian Recktenwald : : citecs GmbH : <spamassassin-talk-dist <at> citecs.de> Unternehmensberatung fuer : voice +49 711 601 2090 : Boeblinger Strasse 189 EDV und Telekommunikation : fax +49 711 601 2092 : D-70199 Stuttgart ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk