RE: Matching a pattern only once

Bob Showalter Mon, 17 Sep 2001 05:11:04 -0700


> -----Original Message-----
> From: Jason Tiller [mailto:[EMAIL PROTECTED]]
> Sent: Friday, September 14, 2001 5:41 PM
> To: '[EMAIL PROTECTED]'
> Subject: RE: Matching a pattern only once
> 
> 
> Hello, Again, Bob, :)
> 
> On Fri, 14 Sep 2001, Bob Showalter wrote:
> 
> > You can use look-behind assertion:
> >
> >    /(?<!~)~$/
> 
> > Which means, match a tilde, not preceded by a tilde, anchored to the
> > end of the string. This will match:
> 
> >    foo~
> >    ~
> >
> > But not:
> >
> >    foo~~
> >    ~~
> 
> I'm trying to understand what this means (I'm a beginner!).  I've gone
> through the perl RE tutorial (perldoc perlretut) and my head is now
> spinning like a top out of control.  Sheesh, these things are
> complicated!!
> 
> So, I'm trying to figure out the difference between:
> 
> /[^~]~$/
> 
> and
> 
> /(?<!~)~$/

Ok, but remember that I suggested /(?:^|[^~])~$/ as an alternate to
the look-behind assertion. /[^~]~$/ is *not* equivalent.

> 
> From what I've read from the perlretut, (?<!~) is a zero-length
> assertion, right?

Yes.

> 
> To test these, I ran the following script:
> 
> #!/usr/bin/perl
> 
> @a = ( "a~", "a~~", "~", "~~" );
> 
> foreach ( @a ) {
>    print "[~^] match: $_\n" if /[^~]~$/;
>    print "(?<!~) match: $_\n" if /(?<!~)~$/;
> }
> 
> The output is:
> 
> [^~] match: a~
> (?<!~) match: a~
> (?<!~) match: ~
> 
> The first regexp has three parts to it:
> 
> [^~]~$
> ^   ^^
> |   ||
> 1   23
> 
> When matching with this regexp, perl walks through the string looking
> for part 1 (non-tilde); when it finds a non-tilde character, then it
> looks to match part 2 by ensuring that the next character *is* a
> tilde.  Finally, if part 2 matches, then it looks to make sure there
> are no characters following, which is part 3 ($ - end string anchor).

Yes, essentially.

> 
> So, "~" and "~~" don't match because perl can't match part 1, which is
> the non-tilde character - there *aren't* any non-tilde characters in
> the string.  However, I gather that in this case we *want* "~~" to
> match, so this regexp doesn't suit our needs.

I assume you mean that you want '~' to match, which this regex
does not. Which is why I suggested /(?:^|[^~])~$/ as an alternate,
and not /[^~]~$/

> 
> The second regexp has three parts to it as well:
> 
> (?<!~)~$
> ^     ^^
> |     ||
> 1     23
> 
> This is very similar, but part 1 is different.  "(?<!~)" is a "negated
> lookbehind zero-length assertion", which is a kind of anchor (right?).
> In this case, when walking through the string looking for a match,
> perl looks for part *2* first, *not* part 1.  In other words, perl
> first looks for a tilde.  When matches part 2 (finds a tilde), then it
> looks to see that there are no more characters in the string (part 3).
> *Then* perl looks to match part 1, which says that the character
> before the tilde matched in part 1 must *not* be a tilde.

Yes, something like that.

> 
> Thus "a~" and "~" match, but "~~" does not.

Correct.

> 
> Bob, is my summary correct?  I'm just trying to get a handle on this.
> It's obvious that Perl RE's are *incredibly* powerful but there seem
> to be so many things to remember...

Well, I've been making my share of mistakes here lately, but as far
as I can tell, your analysis is right on. The exact algorithm that Perl
uses is not known to me, but the results are as you described.

Now, can you tell why '~' is matched by 

   /(?:^|[^~])~$/

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
RE: Matching a pattern only once

Reply via email to