Hi, On Wed, May 28, 2014 at 5:36 PM, Karsten Bräckelmann <guent...@rudersport.de> wrote: > > On Wed, 2014-05-28 at 14:16 -0400, Alex wrote: > > I'm trying to write a body rule that will catch an email exactly > > containing any number of characters up to 15, followed by a URI, > > followed by any number of characters, up to 15. My attempt has failed > > miserably, and hoped someone could help. > > > > body LOC_SHORT_BODY_URI m{^.{0,15}(https?://.{1,50}).{0,15}$} > > > > This catches pretty much everything and I can't figure out why. > > Oh, come on, Alex. We've had that topic just recently in your "Help with > short bodys with URLs" thread. Which wasn't the first time either...
I know, I know. I actually started with that, and it became too complex for me to modify/update when a particular false-negative came in with enough preceding HTML to cause our rule to fail. So, I thought trying to write a simple "body" rule be easy enough, with the help of the team here. > The "body" are all textual parts, rendered and normalized. Consecutive > whitespace is condensed to a single space. An empty line (double > newline) delimits paragraphs. The Subject becomes the first paragraph of > the body. > > The regex pattern is matched against the "body" one paragraph at a time. > > A body rule with beginning and end anchors /^ $/ as you posted matches > complete paragraphs. Not the full body. I don't think I realized multiple buffers weren't considered simultaneously. > > Any help on how to do this more efficiently and effectively would be > > greatly appreciated. > > First, you will need to use a rawbody rule. > > rawbody __SHORT_BODY_URI m~^.{,15} https?://[^ ]+ .{,15}$~ > > Entirely untested, and favoring simplicity and readability over > correctness. (The simple spaces should better be /\s?/ any whitespace, > optional.) I tested this briefly on my sample, and it doesn't match because __CHUNK hits twice. The HTML section is larger than fifteen chars before and fifteen chars after, just as the LOC_SHORT doesn't match for the rawbody being larger than 200 chars. Is it possible to only match on text/plain instead of text/html? > That all said, the rule you are currently trying to write pretty much > sounds like the "has URI and short body" LOC_SHORT rule we discussed > back in Oct 2013... So this doesn't match just as the LOC_SHORT rule doesn't match. Thanks, Alex