On Wed, 2014-05-28 at 14:16 -0400, Alex wrote:
> I'm trying to write a body rule that will catch an email exactly
> containing any number of characters up to 15, followed by a URI,
> followed by any number of characters, up to 15. My attempt has failed
> miserably, and hoped someone could help.
> 
> body   LOC_SHORT_BODY_URI      m{^.{0,15}(https?://.{1,50}).{0,15}$}
> 
> This catches pretty much everything and I can't figure out why.

Oh, come on, Alex. We've had that topic just recently in your "Help with
short bodys with URLs" thread. Which wasn't the first time either...

The "body" are all textual parts, rendered and normalized. Consecutive
whitespace is condensed to a single space. An empty line (double
newline) delimits paragraphs. The Subject becomes the first paragraph of
the body.

The regex pattern is matched against the "body" one paragraph at a time.

A body rule with beginning and end anchors /^ $/ as you posted matches
complete paragraphs. Not the full body.


> Any help on how to do this more efficiently and effectively would be
> greatly appreciated.

First, you will need to use a rawbody rule.

  rawbody __SHORT_BODY_URI  m~^.{,15} https?://[^ ]+ .{,15}$~

Entirely untested, and favoring simplicity and readability over
correctness. (The simple spaces should better be /\s?/ any whitespace,
optional.)

Note the regex pattern for the URI. Unlike you I don't limit its length,
but simply let it consume everything up until a natural URI end --
whitespace. That way I also ensure the trailing string to be no longer
than 15 chars. Your regex doesn't distinguish between the URI and
trailing part, effectively allowing a much longer string after a short
URI.

Since rawbody rules are matched against chunks of 1-2 kByte, we also
need to take care there is no additional other chunk than the one
matching __SHORT_BODY_URI.

Since you already have a rule identifying short messages <= 200 chars,
we can simply reuse it here.

  rawbody __RB_GT_200  /^.{201}/s
  meta    __RB_LE_200  !__RB_GT_200

Another approach would be to actually ensure there is only a single
chunk. And finally, meta them together.

  rawbody __CHUNK  /^./
  tflags  __CHUNK  multiple

  meta    SHORT_BODY_URI  __SHORT_BODY_URI && (__CHUNK == 1)


That all said, the rule you are currently trying to write pretty much
sounds like the "has URI and short body" LOC_SHORT rule we discussed
back in Oct 2013...


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to