Re: \b confusion

Randy W. Sims Fri, 18 Jun 2004 11:08:34 -0700

[EMAIL PROTECTED] wrote:

According to the principle of \b why is this doing this?
$word = "(HP)";
$word =~ s/[,\]\)\}]\b//;
$word =~ s/\b[,\]\)\}]//;
Since the parentheses is on either side of the boundary, it should take off bpth of them. Instead the result is: $word = "(HP" It only took of the end paren.
When I used "(HP),"     the result is "(HP,"?
A second question. If I want to get rid of any non numeric and non alphabetic before and after a word, but not what's in the middle (like apostrphes, dashes) of the word, what's the most simplest way that works.

\b does not match at the end or the beginning of a string, so you need something like (?:^|\b) to match beginning and (?:\b|$) to match ending. The (?:...) construct is a non-capturing match.

Another problem is that in the case of the string '(HP),', you want to match multiple sequential occurences, so you need to specify that in the regex with a '+' following your character class.

Also, in a character class you don't need to escape anything except the sqare brakets, and you don't have to quote them if you move them to immediately after the opening bracket of the character class.

Finally, your brackets are flipped the wrong way for the opening sequence.

If my assumptions are right, you'll end up with something like:

$word =~ s/[],)}]+(?:\b|$)//;
$word =~ s/(?:^|\b)[[,({]+//;

An even better solution is to use the core Text::ParseWords[1] module or the very popular and very usefull Regexp::Common[2].

Randy.

1. <http://search.cpan.org/dist/Text-ParseWords/>
2. <http://search.cpan.org/dist/Regexp-Common/>


--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

Re: \b confusion

Reply via email to