Tomoyuki Sakurai wrote:

>> Hopefully SA2.60 would solve it.

Gordon Cormack <[EMAIL PROTECTED]> writes:

> The version of 2.60 that I have sort of works in detecting obfuscated html.
>
> It *does* detect words split apart by html comments.
>
> It *does not* detect words split apart by bogus tags.

This is non-trivial to do accurately.  By accurately, I mean in a way
that does not hit legitimate (if non-standard) HTML.

> It *does not* reconstruct obfuscated html for the benefit of the
> feature rules or the bayesian classifier.  (I've been tempted to
> pipe the html through lynx ...)

I'm not sure what you mean here.

> It *does not* remove text with fontcolor == backgroundcolor for
> the benefit of the bayesian classifier.

Yeah, this still needs to be tested bit.  I'm not sure whether it would
make a significant difference.  2.60-cvs does have fairly decent
detection of invisible and low contrast fonts and they contribute to the
message score.  Unfortunately, there's a fair amount of poor-written
legitimately HTML that does this too, so the score will probably only be
about 1 to 2.

-- 
Daniel Quinlan                     anti-spam (SpamAssassin), Linux, and open
http://www.pathname.com/~quinlan/   source consulting (looking for new work)


-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to