Here's a bunch of rules I've put together, and partially pulled from here.
I thought I'd share and try to get some feedback on some of them, especially
the first one.  

Mike




#Text outside of the last </HTML> tag
#Example: </HTML> afejg32
#Unfortunately a new line breaks this.  If there is a way to
#  to include newlines in this rule, it would catch a lot more.
rawbody  MK_BAD_HTML_1 /\<\/html\>\s{0,50}\S+\d{0,10}/i
describe MK_BAD_HTML_1 Bad HTML form.  Content after closing HTML tag
score      MK_BAD_HTML_1 1.8

#Using paragraphs and spaces to break lines.  
#Example:  <p align="center">&nbsp;</p> OR <p>nbsp;
rawbody  MK_BAD_HTML_2 /\<p\s{0,50}?\S{0,50}\>\s{0,50}?\&nbsp\;/i       
describe MK_BAD_HTML_2 Bad HTML form.  Breaking lines with paragraphs.
score      MK_BAD_HTML_2 0.1

#Very uncommon that any one would do this
rawbody  MK_BAD_HTML_3 /\t\<\/html\>/i
describe MK_BAD_HTML_3 Bad HTML form.  Tabbed your closing html tag.
score      MK_BAD_HTML_3 0.6

#Check for a beginning HTML tag <HTML>
rawbody  __MK_HTML_TAG_START /\<html/i

#Check for a closing HTML tag </html>
rawbody  __MK_HTML_TAG_END /\<\/html\>/i

#Check to see if the HTML message is made correctly.  Seeing a lot of SPAM
that isn't
meta     MK_BAD_HTML_4 HTML_MESSAGE && !__MK_HTML_TAG_START &&
!__MK_HTML_TAG_END
describe MK_BAD_HTML_4 Bad HTML form.  Doesn't have beginning or closing
HTML tags.
score    MK_BAD_HTML_4 0.4

#Same as MK_BAD_HTML_4, except we just check for a beginning tag without and
end tag
meta     MK_BAD_HTML_5  HTML_MESSAGE && __MK_HTML_TAG_START &&
!__MK_HTML_TAG_END
describe MK_BAD_HTML_5  Bad HTML form.  Has a beginning HTML tag and no end
tag.
score    MK_BAD_HTML_5  0.3

#Same as MK_BAD_HTML_4, except we just check for an end tag without and
beginning tag
meta       MK_BAD_HTML_6 HTML_MESSAGE && !__MK_HTML_TAG_START &&
__MK_HTML_TAG_END
describe MK_BAD_HTML_6 Bad HTML form.  Has an ending HTML tag and no
beginning tag.
score      MK_BAD_HTML_6 0.3

#This takes care of <!asde>, but excludes 
#   <!DOCTYPE HTML ...
rawbody  __MK_BAD_HTML_7 /\<![a-zA-CE-Z]/

#This takes care of tags that don't exist such as <zebra>
#The last / is in there so it doesn't freak out about closing tags.
#<KBD> is a valid tag, but I don't believe we'll see it in email so k is not
in the list.
#Added in to not pickup <[EMAIL PROTECTED]>
rawbody  __MK_BAD_HTML_8 /\<[^abcdefhilmopstuv\/[EMAIL PROTECTED],80}\>/i

#This takes care of closing tags that don't exist such as </zebra>
rawbody  __MK_BAD_HTML_9 /\<\/[^abcdefhilmopstuv]/i

#8/4/2003 Added in due to MS Office <?xml:blahblah> tag
rawbody  __MK_GOOD_HTML_1 /\<\??xml/i

#The next three are a combo of the above three.  
meta       MK_BAD_HTML_10 HTML_MESSAGE && __MK_BAD_HTML_7
describe MK_BAD_HTML_10 Bad HTML form.  HTML Tag <!blah> that does not exist
used.
score      MK_BAD_HTML_10 1.8

meta       MK_BAD_HTML_11 HTML_MESSAGE && __MK_BAD_HTML_8 &&
!__MK_GOOD_HTML_1
describe MK_BAD_HTML_11 Bad HTML form.  HTML beginning tag that does not
exist used.
score      MK_BAD_HTML_11 0.3

meta       MK_BAD_HTML_12 HTML_MESSAGE && __MK_BAD_HTML_9
describe MK_BAD_HTML_12 Bad HTML form.  HTML closing tag that does not exist
used.
score      MK_BAD_HTML_12 0.7


#Yahoo mail doesn't use beginning or closing html tags
header   __MK_FROM_YAHOO_1 Received =~ /mail.yahoo.com/i
header   __MK_FROM_YAHOO_2 From =~ /[EMAIL PROTECTED]/

meta       MK_VALID_LOOKING_YAHOO_MAIL_1 MK_BAD_HTML_4 && __MK_FROM_YAHOO_1
&& __MK_FROM_YAHOO_2
describe MK_VALID_LOOKING_YAHOO_MAIL_1 Offsetting Yahoo! mail penalties.
score      MK_VALID_LOOKING_YAHOO_MAIL_1 -0.5



-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to