On Tue, 16 May 2006, Craig McLean wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> [snipped]
>
> I use this style to catch a couple of common text formatting oddities
> caused by machine-generated input, see:
> http://fukka.co.uk/sa-rules/local/textstyles.cf
>
> Thinking about it, this stuff will nest fairly well, so this should work:
>
> rawbody T_30_DODGY_DIVS m'(?:<DIV>\s{0,}?[\$%\w]\s{0,}?</DIV>.{1,40}?){30}'i
>
> Stick with rawbody, you don't need full. Also, you'll probably want
> case-insensitive, and \s{0,}? to match zero or more whitespace.

Only problem with that is "rawbody" processes the original message one
line at a time,  unlike "full" or "body" which concatinate the whole
message into one large string. So if you're looking for some
characteristic of a message which is spread accross multiple lines of
input you cannot use "rawbody".

Thus you are -very- unlikely to find that 30 repetitions of your pattern
in one of the lines of the input message.

This 'feature' of rawbody has already been the subject of various threads
on this list.

-- 
Dave Funk                                  University of Iowa
<dbfunk (at) engineering.uiowa.edu>        College of Engineering
319/335-5751   FAX: 319/384-0549           1256 Seamans Center
Sys_admin/Postmaster/cell_admin            Iowa City, IA 52242-1527
#include <std_disclaimer.h>
Better is not better, 'standard' is better. B{

Reply via email to