(Is this still "OT"?)

On Wed, 2 Jul 2003, Kai Schaetzl wrote:

> > The only way to extract the text as the viewer would see it is to use
> > the renderer of the viewer's mail client [impossible, given that SA
> > generally runs before the message is even delivered]
> 
> Well, I think one can do three things (all of them):
> 
> 1. just ignore all extra markup or seemingly markup, so that you just get 
> the text

SA already attempts to do that; it has both "body" and "rawbody" tests,
the former using the rendered text of the message.

One thing I'm not clear on is whether any tests look at intermediate
stages of decoding.  That is, if a message has a base64'd HTML body, I
think "rawbody" sees the base64 and "body" sees the rendered content,
but nothing sees the un-rendered HTML.  I'd be glad to learn I'm wrong
about that.

> 2. specifically check for those nasty "workarounds" since they are a spam 
> indicator per se.

That's already done in many cases.  See, however, my concern above.

> 3. render the message or use some HTML-aware scanner to be able to mark 
> those spam typical things like body="#000000", big fonts etc.

That is already done, and 2.60 has some nice heuristics e.g. for finding
text that's close to the same color as the background.  However:

On Wed, 2 Jul 2003, Jim Ford wrote:

> On Wed, Jul 02, 2003 at 07:31:30PM +0200, Kai Schaetzl wrote:
> 
> It'd be easy enough to strip out nonsense like <frame></frame>, wouldn't it?

Yes, SA does this.  The problem in this instance was that the message
contained <frame><noframes>garbage</noframes></frame> which (I suspect)
SA's HTML renderer reduces to "garbage" whereas (e.g.) IE's discards the
entire thing as nonsensical.



-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to