[SAtalk] RFC - Blocking Based on Layout of HTML

Fox Flanders Tue, 01 Jul 2003 09:32:29 -0700

SpamAssassin has 'rawbody', 'full', and 'body' to represent ways you can
analyze a message.  I propose a fourth method of analysis, named something
like 'html_layout'.


A typical spam that gets through my filters looks something like this:

<div align="center">
<table><row><cell><a href><img src></a></cell></row></table>
text
<a href>text</a>
</div>

So you basically have link to a centered image and a centered unsubscribe
link at the bottom.  I would suggest counting different html tags in the
message and allowing the counts to be used in rule writing.

With a rule something like:

html_layout  SINGLE_IMG_LINK eval(.html_layout.a_href_img.count > 0  &&
.html_layout.table.count = 1)
describe   Typical spam with just a single image url
score    3.5

you could block a lot of the 'image in the middle' spam.  Of course, the
spammers will just start adding in cruft html tags, and we will be back to
square one, as we are with Bayesian filtering and the cruft words they add
at the bottom now.  The more flexible and intelligent SA is though, the more
difficult we can make it for spammers to construct spam.

Fox Flanders



-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

[SAtalk] RFC - Blocking Based on Layout of HTML

Reply via email to