> or, alternatively, giveaway patterns in the HTML? That's what's happening > with most obfuscating techniques the spammers are trying -- they become > a very reliable sign of spam in themselves!
I agree. I've been playing with a little perl script that calculates a ratio of (length of full body after stripping HTML tags) / (length of full body including HTML tags) and it seems like a very good spam indicator in preliminary testing. Image-only spam ends up with a low ratio (.1-.2), regular HTML spam or ad copy is higher (.6 or so) and non-spam is usually near 1.0. This would be an easy rule to add to SA but I'm wondering about speed - stripping HTML tags is a messy regexp and SA already does this. Is there a way for the same eval test to access the 'rawbody' and 'body' parts at the same time? It looked to me like my only choice right now would be to make it a rawbody test and strip the tags myself. -- michael moncur mgm at starlingtech.com http://www.starlingtech.com/ "Nobody can be exactly like me. Even I have trouble doing it." -- Tallulah Bankhead ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk