On 4/2/2012 12:58 PM, Stephane Chazelas wrote: > 2012-04-02 12:40:27 -0400, Kris Deugau: >> Can anyone point out what bit of stupidity I'm committing in trying >> to use this: >> >> rawbody OVERSIZE_COMMENT m|<!--(?!-->).{32000,}|s >> >> to match messages that are mostly very very long HTML comment(s)? >> >> Testing the same regex against the whole raw message outside of SA >> seems to fire just fine. > [...] > > Don't know about the spamassassin issue, but that regexp > matches <!-- followed by a sequence of 32000 of more characters > provided that sequence doesn't start with "-->". > > ITYM > > m|<!--(?:(?!-->).){32000,}|s > > That is you need to look ahead at each character of the sequence > to look for the closing comment tag, otherwise you'll match on > <!-- short comment --> <31982 or more characters>
And you may or may not want to match on a closing comment at the end. m|<!--(?:(?!-->).){32000,}-->|s Also, because of all of the lookaheads, this may be an expensive regexp. If you try it, keep a close eye on your SA. If it slows down to a crawl, this is probably the culprit. -- Bowie