-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dan wrote:
>> Hmmm, four DIVs, near each other, each with a single alpha and
>> whitespace. May not be what you are trying to catch, but it's the only
>> real pattern I can see from that snippet.
>>
>> rawbody T_4_DODGY_DIVS
>> m'<DIV>\s+\w</DIV>.{1,40}?<DIV>\s+\w</DIV>.{1,40}?<DIV>\s+\w</DIV>.{1,40}?<DIV>\s+\w</DIV>'i
>>
>> describe T_4_DODGY_DIVS Testing...
>> score T_4_DODGY_DIVS    0.01
> 
> Interesting, instead asking for the count, you are actually showing it
> how many.  Scaled up to 30 and adding space variations, it would look like:
> 
> 
[snipped]

I use this style to catch a couple of common text formatting oddities
caused by machine-generated input, see:
http://fukka.co.uk/sa-rules/local/textstyles.cf

Thinking about it, this stuff will nest fairly well, so this should work:

rawbody T_30_DODGY_DIVS m'(?:<DIV>\s{0,}?[\$%\w]\s{0,}?</DIV>.{1,40}?){30}'i

Stick with rawbody, you don't need full. Also, you'll probably want
case-insensitive, and \s{0,}? to match zero or more whitespace.

C.
- --
Craig McLean            http://fukka.co.uk
[EMAIL PROTECTED]       Where the fun never starts
        Powered by FreeBSD, and GIN!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)

iD8DBQFEaQ+fMDDagS2VwJ4RAiJdAKDfS/Nila7mMDnG3FBBQ10gRX0oHQCgiXt9
vzH0Cu0GJrL/Nc5gxJa1D/c=
=Rh9D
-----END PGP SIGNATURE-----

Reply via email to