On Mon, 17 Jun 2013, Amir 'CG' Caspi wrote:

At 7:20 PM -0700 06/15/2013, John Hardin wrote:
I took a closer look at this and it seems they're working around trivial gibberish detection by putting a valid CSS property at the very beginning of the style tag.

Revising the rules...

I am now seeing STYLE_GIBBERISH hitting on a lot of spam in the past day or so, since the new rules hit the distribution. So far, all TPs, no FPs.

Yay!

Would you be willing to create an HTML_COMMENT_GIBBERISH rule, which would be very similar to this one, but which looks for long strings of gibberish instead HTML comments? (That is, <!-- gibberish -->). A number of FN spams that leak through are using gibberish comments without gibberish styles. I would imagine detecting this should be quite similar to detecting style gibberish...

Well, that's a much harder problem. STYLE tags have a specified format, and content not matching that format is (fairly) easy to detect. Comments are freeform text - "gibberish" has the same meaning there that it does in regular body text.

It's *possible* that converting the __LONGWORDS rules from body to rawbody and making them multiline would be justified, but there would have to be some discussion about that. They are at present unbounded and doing that conversion blindly could be Very Bad.

Perhaps a better approach would be to modify the HTML parser plugin to support rules regarding the size of HTML comments. This also could be done in a rawbody rule, but the size of comments may not be a useful spam sign.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Activist: Someone who gets involved.
  Unregistered Lobbyist: Someone who gets involved with something
    the MSM doesn't approve of.                           -- WizardPC
-----------------------------------------------------------------------
 Tomorrow: SWMBO's Birthday

Reply via email to