On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote:
At 7:20 PM -0700 06/15/2013, John Hardin wrote:
I took a closer look at this and it seems they're working around trivial
gibberish detection by putting a valid CSS property at the very beginning
of the style tag.
Revising the rules...
I am now seeing STYLE_GIBBERISH hitting on a lot of spam in the past day or
so, since the new rules hit the distribution. So far, all TPs, no FPs.
Yay!
Would you be willing to create an HTML_COMMENT_GIBBERISH rule, which
would be very similar to this one, but which looks for long strings of
gibberish instead HTML comments? (That is, <!-- gibberish -->). A
number of FN spams that leak through are using gibberish comments
without gibberish styles. I would imagine detecting this should be
quite similar to detecting style gibberish...
Well, that's a much harder problem. STYLE tags have a specified format,
and content not matching that format is (fairly) easy to detect. Comments
are freeform text - "gibberish" has the same meaning there that it does in
regular body text.
It's *possible* that converting the __LONGWORDS rules from body to rawbody
and making them multiline would be justified, but there would have to be
some discussion about that. They are at present unbounded and doing that
conversion blindly could be Very Bad.
Perhaps a better approach would be to modify the HTML parser plugin to
support rules regarding the size of HTML comments. This also could be done
in a rawbody rule, but the size of comments may not be a useful spam sign.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Activist: Someone who gets involved.
Unregistered Lobbyist: Someone who gets involved with something
the MSM doesn't approve of. -- WizardPC
-----------------------------------------------------------------------
Tomorrow: SWMBO's Birthday