Jenda Krynicky wrote:
>
> From: "Jayakumar Rajagopal" <[EMAIL PROTECTED]>
> > hi friends,
> > in regexp, I feel \s and \b behaves same.
> > can someone send me a contradiction ?
>
> $str = "This is a sentence. And this is another.";
>
> @all_words = ($str =~ /\b(\w+)\b/g);
> print join( ', ', @all_words), "\n";
>
> @some_words = ($str =~ /\s(\w+)\s/g);
> print join( ', ', @some_words), "\n";
>
> Quite a big difference ain't it?
>
> \b matches 0 characters just between a word and non-word character,
> while \s matches one space.

Except that the second case doesn't find 'a' or 'this' because the
match for their preceding words chews up the whitespace that would
delimit them. Obviously, a zero-width look-ahed fixes this:

  $str = "This is a sentence. And this is another.";

  @all_words = ($str =~ /\b(\w+)\b/g);
  print join( ', ', @all_words), "\n";

  @some_words = ($str =~ /\s(\w+)(?=\s)/g);
  print join( ', ', @some_words), "\n";

**OUTPUT

  This, is, a, sentence, And, this, is, another
  is, a, And, this, is

'sentence' and 'another' were omitted in the second case simply because
they weren't followed by whitespace.

Something else worth pointing out: \b considers the beginning or end of
the string to be equivalent to a \W (non-word) match, so the first case
also found 'This' at the start of the string.

Finally, if you just need to find all the 'word-character' substrings then
there's no need for either \b or \s:

  @all_words = ($str =~ /\w+/g);
  print join( ', ', @all_words), "\n";

**OUTPUT

  This, is, a, sentence, And, this, is, another

HTH,

Rob



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to