On Tue, Apr 03, 2012 at 11:00:56PM +0300, Henrik K wrote: > On Mon, Apr 02, 2012 at 12:40:27PM -0400, Kris Deugau wrote: > > Can anyone point out what bit of stupidity I'm committing in trying > > to use this: > > > > rawbody OVERSIZE_COMMENT m|<!--(?!-->).{32000,}|s > > > > to match messages that are mostly very very long HTML comment(s)? > > > > Testing the same regex against the whole raw message outside of SA > > seems to fire just fine. > > HTML parser already has all the information needed. Simply use the existing > HTMLEval method: > > body OVERSIZE_COMMENT eval:html_text_match('comment', '(?s)^(?=.{32000})') > > (?s) to enable single-line mode > (?=) lookahead to prevent SA storing the match result (save memory :p) > > This only checks the "main" message body that SA uses. If you want to check > _all_ mime parts, here's a quick plugin: > > http://sa.hege.li/HTMLComments.pm
PS. Learn something new every day... it seems perlre quantifiers can't be bigger than 32766. To test anything bigger you need some hack like: (?=(?:.{1000}){50})