On Tue, Apr 03, 2012 at 11:00:56PM +0300, Henrik K wrote:
> On Mon, Apr 02, 2012 at 12:40:27PM -0400, Kris Deugau wrote:
> > Can anyone point out what bit of stupidity I'm committing in trying
> > to use this:
> > 
> > rawbody OVERSIZE_COMMENT        m|<!--(?!-->).{32000,}|s
> > 
> > to match messages that are mostly very very long HTML comment(s)?
> > 
> > Testing the same regex against the whole raw message outside of SA
> > seems to fire just fine.
> 
> HTML parser already has all the information needed. Simply use the existing
> HTMLEval method:
> 
> body OVERSIZE_COMMENT eval:html_text_match('comment', '(?s)^(?=.{32000})')
> 
> (?s) to enable single-line mode
> (?=) lookahead to prevent SA storing the match result (save memory :p)
> 
> This only checks the "main" message body that SA uses. If you want to check
> _all_ mime parts, here's a quick plugin:
> 
> http://sa.hege.li/HTMLComments.pm

PS. Learn something new every day... it seems perlre quantifiers can't be
bigger than 32766. To test anything bigger you need some hack like:
(?=(?:.{1000}){50})

Reply via email to