I made a rule that catches many of these bogus HTML tags, based on the fact that there 
are only three valid standalone tags of 9 characters or more (according to the list at 
http://devedge.netscape.com/library/xref/2001/html-element/ ):

# check for invalid HTML tags of 9 characters or more

rawbody PT_BOGUS_HTML    /\<\/?(?!(?:blockquote|optiongroup|plaintext))[a-z]{9,15}\>/
describe PT_BOGUS_HTML   random long words disguised as HTML tags
score PT_BOGUS_HTML      1.0

Of course, it's possible that someone would put a long word in angle brackets in a 
legit email; it would be better to have a rule set that looks for multiple instances 
of this pattern.  You can make it stricter by removing the first ? in the regexp; then 
only "closing" HTML tags will be matched.  As always, YMMV; test new rules before 
using in production.

Does anyone have a better test for this?


Pierre Thomson
BIC



-----Original Message-----
From: Christian Recktenwald <spamassassin-talk-dist <at> citecs.de>
Subject: Filter rule f. invalid HTML tags?
Date: Mon, 05 Jan 2004 11:59:05 +0100

Hi,

I've recognized a lot of invalid HTML tags in several spam 
messages. According to w3.org there are 92 valid HTML tags
defined for HTML 4.01.

As far as I can see, such crud is not recognized by sa.

How about a rule looking for invalid html tags?

-- 
Christian Recktenwald      :                         :
citecs GmbH                : <spamassassin-talk-dist <at> citecs.de>
Unternehmensberatung fuer  : voice +49 711 601 2090  : Boeblinger Strasse 189
EDV und Telekommunikation  : fax   +49 711 601 2092  : D-70199 Stuttgart





-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to