----- Original Message ----- 
From: "ian douglas" <[EMAIL PROTECTED]>
To: "Matthew Cline" <[EMAIL PROTECTED]>; "Spamassassin List"
<[EMAIL PROTECTED]>
Sent: Friday, August 01, 2003 4:24 AM
Subject: RE: [SAtalk] those pesky small v*agra ads


> > Hmmmm, maybe we should make some new rules that test the ratio
> of invisible text to visible text?
>
> But if the background is BLACK, white text is perfectly acceptable ...
> right?
>
> So defining "visible" vs "invisible" is your toughest chore.

Exactly. And that is close to impossible. Last night I enthusiastically made
three rules, which I added as .txt attachment, to avoid wrap:

1): MASKED_HTML_TEXT

This rule looks for a <body> element, with a hex bgcolor property, and
matches that against a font-color with the same value. That condition is
marked as possible spam. It will match in:

$body = '<body bgcolor = "#fffffe"> yada yada
yada <font face="two" color="#fffffe">';

2): MASKED_HTML_TEXT_1

Same as one, but looks for word-color codes (like "white"). It will match
in:

$body = '<body bgcolor = "white"> yada yada
yada <font face="two" color="white">';

3): MASKED_HTML_TEXT_2

Same as before, except looks for empty body element, and matches that with
either "white" or "#ffffff". It will match in:

$body = '<body> yada yada
yada <font face="two" color="#ffffff">

That is the good new. :) The bad news is, that the true background color, or
I should say, background appearance, is almost impossible to determine.
Consider table colors, <td> colors, etc. Not to mention that white,
stretched gif used for background color. And that is just 'old' style HTML.
:)

Hence I gave my rules a low score. But still, you might find them useful.

- Mark
full     MASKED_HTML_TEXT         /\<[^>]*?body 
+?[^>]*?bgcolor[^>]*?(\043[a-f]{6})[^>]*?\>(.|\s)*?\<[^>]*?font 
*?[^>]*?color[^>]*?(\1)[^>]*?\>/mi
describe MASKED_HTML_TEXT         Masked HTML text
score    MASKED_HTML_TEXT         0.5

full     MASKED_HTML_TEXT_1       /\<[^>]*?body 
+?[^>]*?bgcolor[^\043>]*?(\w{3,})[^\w>]*?\>(.|\s)*?\<[^>]*?font 
*?[^>]*?color[^>]*?(\1)\W[^>]*?\>/mi
describe MASKED_HTML_TEXT_1       Masked HTML text
score    MASKED_HTML_TEXT_1       0.5

full     MASKED_HTML_TEXT_2       /\<[^>]*?body[^c>]*?\>(.|\s)*?\<[^>]*?font 
*?[^>]*?color[^>]*?(white|\043ffffff)[^>]*?\>/mi
describe MASKED_HTML_TEXT_2       Masked HTML text
score    MASKED_HTML_TEXT_2       0.3

Reply via email to