On 4/2/2012 12:58 PM, Stephane Chazelas wrote:
> 2012-04-02 12:40:27 -0400, Kris Deugau:
>> Can anyone point out what bit of stupidity I'm committing in trying
>> to use this:
>>
>> rawbody OVERSIZE_COMMENT        m|<!--(?!-->).{32000,}|s
>>
>> to match messages that are mostly very very long HTML comment(s)?
>>
>> Testing the same regex against the whole raw message outside of SA
>> seems to fire just fine.
> [...]
>
> Don't know about the spamassassin issue, but that regexp
> matches <!-- followed by a sequence of 32000 of more characters
> provided that sequence doesn't start with "-->".
>
> ITYM
>
> m|<!--(?:(?!-->).){32000,}|s
>
> That is you need to look ahead at each character of the sequence
> to look for the closing comment tag, otherwise you'll match on
> <!-- short comment --> <31982 or more characters>

And you may or may not want to match on a closing comment at the end.

m|<!--(?:(?!-->).){32000,}-->|s

Also, because of all of the lookaheads, this may be an expensive
regexp.  If you try it, keep a close eye on your SA.  If it slows down
to a crawl, this is probably the culprit.

-- 
Bowie

Reply via email to