Jenda Krynicky wrote: > > From: "Jayakumar Rajagopal" <[EMAIL PROTECTED]> > > hi friends, > > in regexp, I feel \s and \b behaves same. > > can someone send me a contradiction ? > > $str = "This is a sentence. And this is another."; > > @all_words = ($str =~ /\b(\w+)\b/g); > print join( ', ', @all_words), "\n"; > > @some_words = ($str =~ /\s(\w+)\s/g); > print join( ', ', @some_words), "\n"; > > Quite a big difference ain't it? > > \b matches 0 characters just between a word and non-word character, > while \s matches one space.
Except that the second case doesn't find 'a' or 'this' because the match for their preceding words chews up the whitespace that would delimit them. Obviously, a zero-width look-ahed fixes this: $str = "This is a sentence. And this is another."; @all_words = ($str =~ /\b(\w+)\b/g); print join( ', ', @all_words), "\n"; @some_words = ($str =~ /\s(\w+)(?=\s)/g); print join( ', ', @some_words), "\n"; **OUTPUT This, is, a, sentence, And, this, is, another is, a, And, this, is 'sentence' and 'another' were omitted in the second case simply because they weren't followed by whitespace. Something else worth pointing out: \b considers the beginning or end of the string to be equivalent to a \W (non-word) match, so the first case also found 'This' at the start of the string. Finally, if you just need to find all the 'word-character' substrings then there's no need for either \b or \s: @all_words = ($str =~ /\w+/g); print join( ', ', @all_words), "\n"; **OUTPUT This, is, a, sentence, And, this, is, another HTH, Rob -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] <http://learn.perl.org/> <http://learn.perl.org/first-response>