On 1/30/2014 6:37 PM, David B Funk wrote:
On Thu, 30 Jan 2014, Amir Caspi wrote:

On Jan 30, 2014, at 10:28 AM, Kevin A. McGrail <kmcgr...@pccc.com> wrote:

If you want to share the complete rule, I can throw it into my sandbox and see what masscheck thinks as well.


The complete rule would be something like this, assuming Andy implemented it as I wrote it:

rawbody HTML_NONSENSE_TAGS /(?:<[A-Za-z0-9]{4,}>\s*){10,}/
describe HTML_NONSENSE_TAGS Many consecutive multi-letter HTML tags, likely nonsense/spam
score HTML_NONSENSE_TAGS 0.001

Actually that unbounded {10,} repeat can be written as an explicit {10} with out reducing the effectiveness of the rule and make it more CPU efficient. IE once you've found at least 10 consecutive pseudo-tags do you care if there are more than 10 (since you're not looking for anything specific after the match nor
doing anything with knowing the exact number of them)
Just an FYI that I checked a bit ago and AC_HTML_NONSENSE_TAGS was promoted to a published rule scoring 0.51.

Regards,
KAM

Reply via email to