On 1/30/2014 6:37 PM, David B Funk wrote:
On Thu, 30 Jan 2014, Amir Caspi wrote:
On Jan 30, 2014, at 10:28 AM, Kevin A. McGrail <kmcgr...@pccc.com>
wrote:
If you want to share the complete rule, I can throw it into my
sandbox and see what masscheck thinks as well.
The complete rule would be something like this, assuming Andy
implemented it as I wrote it:
rawbody HTML_NONSENSE_TAGS /(?:<[A-Za-z0-9]{4,}>\s*){10,}/
describe HTML_NONSENSE_TAGS Many consecutive multi-letter HTML tags,
likely nonsense/spam
score HTML_NONSENSE_TAGS 0.001
Actually that unbounded {10,} repeat can be written as an explicit
{10} with out
reducing the effectiveness of the rule and make it more CPU efficient.
IE once
you've found at least 10 consecutive pseudo-tags do you care if there
are more
than 10 (since you're not looking for anything specific after the
match nor
doing anything with knowing the exact number of them)
Just an FYI that I checked a bit ago and AC_HTML_NONSENSE_TAGS was
promoted to a published rule scoring 0.51.
Regards,
KAM