RE: [SAtalk] Graphics-only spam?

Michael Moncur Wed, 24 Jul 2002 09:31:11 -0700

> or, alternatively, giveaway patterns in the HTML?  That's what's happening
> with most obfuscating techniques the spammers are trying -- they become
> a very reliable sign of spam in themselves!


I agree. I've been playing with a little perl script that calculates a ratio
of (length of full body after stripping HTML tags) / (length of full body
including HTML tags) and it seems like a very good spam indicator in
preliminary testing. Image-only spam ends up with a low ratio (.1-.2),
regular HTML spam or ad copy is higher (.6 or so) and non-spam is usually
near 1.0.

This would be an easy rule to add to SA but I'm wondering about speed -
stripping HTML tags is a messy regexp and SA already does this. Is there a
way for the same eval test to access the 'rawbody' and 'body' parts at the
same time? It looked to me like my only choice right now would be to make it
a rawbody test and strip the tags myself.

--
michael moncur   mgm at starlingtech.com   http://www.starlingtech.com/
"Nobody can be exactly like me.  Even I have trouble doing it."
                -- Tallulah Bankhead



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

RE: [SAtalk] Graphics-only spam?

Reply via email to