-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Dan wrote:
>> Hmmm, four DIVs, near each other, each with a single alpha and
>> whitespace. May not be what you are trying to catch, but it's the only
>> real pattern I can see from that snippet.
>>
>> rawbody T_4_DODGY_DIVS
>> m'<DIV>\s+\w</DIV>.{1,40}?<DIV>\s+\w</DIV>.{1,40}?<DIV>\s+\w</DIV>.{1,40}?<DIV>\s+\w</DIV>'i
>>
>> describe T_4_DODGY_DIVS Testing...
>> score T_4_DODGY_DIVS 0.01
>
> Interesting, instead asking for the count, you are actually showing it
> how many. Scaled up to 30 and adding space variations, it would look like:
>
>
[snipped]
I use this style to catch a couple of common text formatting oddities
caused by machine-generated input, see:
http://fukka.co.uk/sa-rules/local/textstyles.cf
Thinking about it, this stuff will nest fairly well, so this should work:
rawbody T_30_DODGY_DIVS m'(?:<DIV>\s{0,}?[\$%\w]\s{0,}?</DIV>.{1,40}?){30}'i
Stick with rawbody, you don't need full. Also, you'll probably want
case-insensitive, and \s{0,}? to match zero or more whitespace.
C.
- --
Craig McLean http://fukka.co.uk
[EMAIL PROTECTED] Where the fun never starts
Powered by FreeBSD, and GIN!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
iD8DBQFEaQ+fMDDagS2VwJ4RAiJdAKDfS/Nila7mMDnG3FBBQ10gRX0oHQCgiXt9
vzH0Cu0GJrL/Nc5gxJa1D/c=
=Rh9D
-----END PGP SIGNATURE-----