On Sat, 20 Nov 2010, David B Funk wrote:

The idea was that most all legit 3 character HTML tags such as '<div>'
contained at least one of those letters ([dpry]) in them. So a purported
tag that had none of them was not legit and thus probably bogus spammer
spoor.
With the evolution of HTML (xml, etc) that's no longer a safe
asumption, so that rule probably FPs.

The presence of multiple empty tag pairs might still be useful...

Off the top of my head and untested:

rawbody __EMPTY_HTML_TAG  m,<([a-z]+)></\1>,i
tflags  __EMPTY_HTML_TAG  multiple
meta    MANY_EMPTY_TAGS   __EMPTY_HTML_TAG > 9

This might already be a rule, I didn't look.

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Activist: Someone who gets involved.
  Unregistered Lobbyist: Someone who gets involved with something
    the MSM doesn't approve of.                           -- WizardPC
-----------------------------------------------------------------------
 27 days until TRON Legacy

Reply via email to