On Thu, 30 Jan 2014, Amir Caspi wrote:
On Jan 30, 2014, at 10:28 AM, Kevin A. McGrail <kmcgr...@pccc.com> wrote:
If you want to share the complete rule, I can throw it into my sandbox and see
what masscheck thinks as well.
The complete rule would be something like this, assuming Andy implemented it as
I wrote it:
rawbody HTML_NONSENSE_TAGS /(?:<[A-Za-z0-9]{4,}>\s*){10,}/
describe HTML_NONSENSE_TAGS Many consecutive multi-letter HTML tags, likely
nonsense/spam
score HTML_NONSENSE_TAGS 0.001
Score to be adjusted as needed, of course.
I'd suggest writing it as a subrule first, to see how well it performs
against the masscheck corpora. If it does well by itself (good hits, high
S/O), then a meta can be added to expose it for scoring. If it hits a lot
but the S/O ratio is low, then it could be analyzed for possible
combinations with other rules to get something that performs well.
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
For those who are being swayed by Microsoft's whining about the
GPL, consider how aggressively viral their Shared Source license is:
If you've *ever* seen *any* MS code covered by the Shared Source
license, you're infected for life. MS can sue you for Intellectual
Property misappropriation whenever they like, so you'd better not
come up with any Innovative Ideas that they want to Embrace...
-----------------------------------------------------------------------
2 days until the 11st anniversary of the loss of STS-107 Columbia