Hi,

On Wed, May 28, 2014 at 5:36 PM, Karsten Bräckelmann <guent...@rudersport.de>
wrote:
>
> On Wed, 2014-05-28 at 14:16 -0400, Alex wrote:
> > I'm trying to write a body rule that will catch an email exactly
> > containing any number of characters up to 15, followed by a URI,
> > followed by any number of characters, up to 15. My attempt has failed
> > miserably, and hoped someone could help.
> >
> > body   LOC_SHORT_BODY_URI      m{^.{0,15}(https?://.{1,50}).{0,15}$}
> >
> > This catches pretty much everything and I can't figure out why.
>
> Oh, come on, Alex. We've had that topic just recently in your "Help with
> short bodys with URLs" thread. Which wasn't the first time either...

I know, I know. I actually started with that, and it became too complex for
me to modify/update when a particular false-negative came in with enough
preceding HTML to cause our rule to fail. So, I thought trying to write a
simple "body" rule be easy enough, with the help of the team here.

> The "body" are all textual parts, rendered and normalized. Consecutive
> whitespace is condensed to a single space. An empty line (double
> newline) delimits paragraphs. The Subject becomes the first paragraph of
> the body.
>
> The regex pattern is matched against the "body" one paragraph at a time.
>
> A body rule with beginning and end anchors /^ $/ as you posted matches
> complete paragraphs. Not the full body.

I don't think I realized multiple buffers weren't considered simultaneously.

> > Any help on how to do this more efficiently and effectively would be
> > greatly appreciated.
>
> First, you will need to use a rawbody rule.
>
>   rawbody __SHORT_BODY_URI  m~^.{,15} https?://[^ ]+ .{,15}$~
>
> Entirely untested, and favoring simplicity and readability over
> correctness. (The simple spaces should better be /\s?/ any whitespace,
> optional.)

I tested this briefly on my sample, and it doesn't match because __CHUNK
hits twice. The HTML section is larger than fifteen chars before and
fifteen chars after, just as the LOC_SHORT doesn't match for the rawbody
being larger than 200 chars.

Is it possible to only match on text/plain instead of text/html?

> That all said, the rule you are currently trying to write pretty much
> sounds like the "has URI and short body" LOC_SHORT rule we discussed
> back in Oct 2013...

So this doesn't match just as the LOC_SHORT rule doesn't match.

Thanks,
Alex

Reply via email to