On Thu, 2011-04-21 at 14:55 -0800, Kevin Miller wrote:
> I did get it to work from the CLI, and wrote the following rule:
> 
> body      CBJ_GiveMeABreak  /\["<br>"]{5,}/

This still is wrong. Something that has been mentioned, but not properly
explained to you is the char class, denoted by square brackets. The
RE /[bar]/ will match any char in the class, that is either a "b", an
"a" or an "r".

In this case (the rule above) it is NOT a char class though, because you
backslash escaped the opening square bracket, turning it into the char
itself. The reason the RE (the part inside the slash / delimiters) DID
work with grep on the command line is, because the slash escaped the
opening square bracket for your shell, preventing your *shell* from
interpreting it -- but the RE passed to your grep features the square
bracket, turning it again into a char class. Multiple levels of
escaping. If you wanna test an RE with grep, seriously better 'single
quote' the entire RE, rather than escaping single chars. This will
prevent such issues.

grep on your shell was looking for any char of the class [<>br], 5
times. That matches the string '<br><'.

For perl, with one less interpretation of the string (no shell), it
looks for the string '["<br>"]]]]]'

Yes, the double-quotes prevented your shell from interpreting < as
STDIN, like it was breaking your command in the OP. Without the shell,
it just is a char, though. Also, the {5,} operates on the thingy in
front of it -- which is a single char here, because you did not (?:)
group the leading sub-RE.


What you want. The string '<br>', repeated five times (or more). For the
quantifier, you need to group the string.

  /(?:<br>){5}/

Besides the above, do not use {5,} as a quantifier, UNLESS there is
something after that string you also want to match. If you do not want
to match anything after that, "exactly 5 times" {5} will match always
the same as "five or more" {5,} -- the latter just unnecessarily keeps
on trying.


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to