Hello Andrea,

Sunday, October 26, 2003, 6:08:02 PM, you wrote:

AR> Hi folks,

AR> Now I'm receiving spam:
AR> - "hi" as subject
AR> - without text in the body (html or not).

AR> Have you got any idea to filter those?

I think I've developed a set of filters for that.  They have a content
type of text/html (at least mine do), and no body (no text and not HTML
tags).  I have therefore created a set of rules which identify the
text/html classification and lack of HTML tags.

I expect I'll submit these to the wiki and Emporium later this week,
after I've verified their accuracy here (they at least --lint good, and
hit 74 spam and only 1 ham in a 39k corpus).

# RM rules to identify a completely empty HTML email
header   RM_hc_HTML      Content-Type =~ /text\/html/
describe RM_hc_HTML      Email is text/html format
score    RM_hc_HTML      0.001
rawbody  RM_rb_TITLE     /<title>/i
describe RM_rb_TITLE     Testing for HTML title in emails
score    RM_rb_TITLE     0.001
rawbody  RM_rb_BODY      /<BODY>/i
describe RM_rb_BODY      Testing for HTML BODY in emails
score    RM_rb_BODY      0.001
rawbody  RM_rb_HTML      /<HTML>/i
describe RM_rb_HTML      Testing for HTML tag in emails
score    RM_rb_HTML      0.001
rawbody  RM_rb_DIV       /<DIV>/i
describe RM_rb_DIV       Testing for HTML Div in emails
score    RM_rb_DIV       0.001
rawbody  RM_rb_PRE       /<PRE>/i
describe RM_rb_PRE       Testing for HTML PRE tag in emails
score    RM_rb_PRE       0.001
rawbody  RM_rb_PARA      /<P>/i
describe RM_rb_PARA      Testing for HTML Paragraph in emails
score    RM_rb_PARA      0.001
rawbody  RM_rb_FONT      /<\/FONT>/i
describe RM_rb_FONT      Testing for HTML Font tag in emails
score    RM_rb_FONT      0.001
rawbody  RM_rb_BREAK     /<BR>/i
describe RM_rb_BREAK     Testing for HTML Break in emails
score    RM_rb_BREAK     0.001
rawbody  RM_rb_ANCHOR    /<\/a>/i
describe RM_rb_ANCHOR    Testing for HTML end of anchor in emails
score    RM_rb_ANCHOR    0.001
meta     RM_m_EmptyHTML  RM_hc_HTML && !( RM_rb_TITLE ||  RM_rb_HTML || RM_rb_FONT || 
RM_rb_BODY || RM_rb_PRE || RM_rb_DIV || RM_rb_PARA || RM_rb_ANCHOR || RM_rb_BREAK )
describe RM_m_EmptyHTML  Email is HTML format, but no title, body, div, paragraph, nor 
break tags found
score    RM_m_EmptyHTML  1.37   # 74s/1h of 39283 corpus

Bob Menschel




-------------------------------------------------------
This SF.net email is sponsored by: The SF.net Donation Program.
Do you like what SourceForge.net is doing for the Open
Source Community?  Make a contribution, and help us add new
features and functionality. Click here: http://sourceforge.net/donate/
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to