On Jan 30, 2014, at 10:28 AM, Kevin A. McGrail <kmcgr...@pccc.com> wrote:

> If you want to share the complete rule, I can throw it into my sandbox and 
> see what masscheck thinks as well.

The complete rule would be something like this, assuming Andy implemented it as 
I wrote it:

rawbody HTML_NONSENSE_TAGS      /(?:<[A-Za-z0-9]{4,}>\s*){10,}/
describe HTML_NONSENSE_TAGS     Many consecutive multi-letter HTML tags, likely 
nonsense/spam
score HTML_NONSENSE_TAGS        0.001

Score to be adjusted as needed, of course.

If one wants to be even more explicit, one could require that the tags be 
prefaced with a <style> tag, although that should, hopefully, get picked up by 
John Hardin's modifications to STYLE_GIBBERISH sometime in the near future.

Cheers.

--- Amir

Reply via email to