Henrik K wrote:
On Mon, Apr 02, 2012 at 12:40:27PM -0400, Kris Deugau wrote:
Can anyone point out what bit of stupidity I'm committing in trying
to use this:

rawbody OVERSIZE_COMMENT        m|<!--(?!-->).{32000,}|s

to match messages that are mostly very very long HTML comment(s)?

Testing the same regex against the whole raw message outside of SA
seems to fire just fine.

HTML parser already has all the information needed. Simply use the existing
HTMLEval method:

body OVERSIZE_COMMENT eval:html_text_match('comment', '(?s)^(?=.{32000})')

Interesting!  I'll try that out too.

This only checks the "main" message body that SA uses. If you want to check
_all_ mime parts, here's a quick plugin:

http://sa.hege.li/HTMLComments.pm

Hm. Does check_html_comment_length get each tag all by itself? Otherwise it looks like the regex in your while() will match a message with a short opening comment, $find_len of miscellaneous content or HTML tags, and a short closing comment.

-kgd

Reply via email to