----- Original Message ----- From: "ian douglas" <[EMAIL PROTECTED]> To: "Matthew Cline" <[EMAIL PROTECTED]>; "Spamassassin List" <[EMAIL PROTECTED]> Sent: Friday, August 01, 2003 4:24 AM Subject: RE: [SAtalk] those pesky small v*agra ads
> > Hmmmm, maybe we should make some new rules that test the ratio > of invisible text to visible text? > > But if the background is BLACK, white text is perfectly acceptable ... > right? > > So defining "visible" vs "invisible" is your toughest chore. Exactly. And that is close to impossible. Last night I enthusiastically made three rules, which I added as .txt attachment, to avoid wrap: 1): MASKED_HTML_TEXT This rule looks for a <body> element, with a hex bgcolor property, and matches that against a font-color with the same value. That condition is marked as possible spam. It will match in: $body = '<body bgcolor = "#fffffe"> yada yada yada <font face="two" color="#fffffe">'; 2): MASKED_HTML_TEXT_1 Same as one, but looks for word-color codes (like "white"). It will match in: $body = '<body bgcolor = "white"> yada yada yada <font face="two" color="white">'; 3): MASKED_HTML_TEXT_2 Same as before, except looks for empty body element, and matches that with either "white" or "#ffffff". It will match in: $body = '<body> yada yada yada <font face="two" color="#ffffff"> That is the good new. :) The bad news is, that the true background color, or I should say, background appearance, is almost impossible to determine. Consider table colors, <td> colors, etc. Not to mention that white, stretched gif used for background color. And that is just 'old' style HTML. :) Hence I gave my rules a low score. But still, you might find them useful. - Mark
full MASKED_HTML_TEXT /\<[^>]*?body +?[^>]*?bgcolor[^>]*?(\043[a-f]{6})[^>]*?\>(.|\s)*?\<[^>]*?font *?[^>]*?color[^>]*?(\1)[^>]*?\>/mi describe MASKED_HTML_TEXT Masked HTML text score MASKED_HTML_TEXT 0.5 full MASKED_HTML_TEXT_1 /\<[^>]*?body +?[^>]*?bgcolor[^\043>]*?(\w{3,})[^\w>]*?\>(.|\s)*?\<[^>]*?font *?[^>]*?color[^>]*?(\1)\W[^>]*?\>/mi describe MASKED_HTML_TEXT_1 Masked HTML text score MASKED_HTML_TEXT_1 0.5 full MASKED_HTML_TEXT_2 /\<[^>]*?body[^c>]*?\>(.|\s)*?\<[^>]*?font *?[^>]*?color[^>]*?(white|\043ffffff)[^>]*?\>/mi describe MASKED_HTML_TEXT_2 Masked HTML text score MASKED_HTML_TEXT_2 0.3