>
>> What exactly is matched by \g and \G is controlled by two new special
>> variables, @^g and @^G, which are arrays of strings. 
>
>These sorts of global variables have been a problem in the past.
>Since they change the meaning of the \g and \G escapes, I think they
>should be pragmas or some other declaration that has a lexical scope.

    Good point. Something like:
    
        use pairs '(' => ')', '{' => '}', ...;
        
perhaps? And it's a compile-time error to have an odd number in that list.


>(Also, the \G escape already has a meaning in Perl 5, so it would
>probably be better to think of some other name.)

    Gah, yes. I completely forgot about \G. Someone else pointed this out
to me, too. 

    How about \p and \P  ("P" for "pairwise groupings" or just "pairs")?



>The big problem I see that you didn't address is that you didn't say
>what would happen when the target string contains mismatched
>parentheses.
>[...]
>Now suppose the string were 
>
>    $string = "(b - a + 1] * 7)";
>    $string =~ /\g.*?\G/;
>
>Now what happens here?  \g matches "(" and sets up \G so that \G will
>only match the corresponding ")".  Then what?  I'm not sure from your
>proposal.  
>
>Your later example (in the 'implementation' section) suggests that '['
>and ']' are ignored once \g matches a '('.  If that is true, then in
>the example above, the .*?  would match "bb - a + 1] * 7".  I think
>this won't be what people will want from \g...\G.  We will still going
>to get a lot of questions from people asking how to tell if the
>delimiters in a string are balanced.

    Here's my thinking: If you're parsing a string that contains possibly
nested parentheses, you'll want to find what's between the outermost 
parentheses first, then recursively check that found text for parentheses.

    In your mismatch-brace example, "(b - a + 1] * 7)", The \g and \G
set (or \p and \P, if we go that way) match the parentheses "(" and ")". 
The .*? matches "b - a + 1] * 7", as you say. The RE has succesfully 
found a pair of brackets and their contents. 

    I was thinking that the next logical thing for the programmer to do
is to recursively check the found string "b - a + 1] * 7" against the 
same (or similar) pattern, which would fail, at which point the programmer
would know there's a syntax error.


>(Site note: I'm not sure why you used .*? here instead of .*, since as
>I understand your proposal, .* would have done the same thing.  I
>suggest that you change .*? to .* or else add a remark about why this
>would be different.)

    Sorry for any confusion. I should have given another example to 
clear this up. Consider the string "(a + b) * (c + d)". The regular
expression /\g.*?\G/ would match "(a + b)", while /\g.*\G/ would match
the whole string. The parentheses are not nested; they are all at the
same nesting level.


>Another ambiguity in your proposal:  You want
>
>        [\g]
>
>to match any single open delimiter character.  But then later on you have
>an example where @^g contains the string "/*".  What would [\g] do in
>this case?

    Match an instance of "/*" if it appears. What are the implications
of an element of a [...] set matching a multi-character string?



>>    As it continues scanning, it encounters the "]" between the "f" and the
>>    ")". The \G does not match this "]" character, because the \g must match
>>    a ")".
>
>You mean \G here instead of \g, don't you?

    Yes. My mistake.

>
>> sub parse
>> {
>>     my $string = shift;
>>     while ($string =~ /([^\g])*(\g)(.*?)(\G)([^\g\G]*)/g)
>
>Don't you mean ([^\g]*) instead of ([^\g])* here?

    Yes, another typo. Thanks for the correction.


 ----------------------------------------------------------------------
 Eric J. Roode,  [EMAIL PROTECTED]           print  scalar  reverse  sort
 Senior Software Engineer                'tona ', 'reh', 'ekca', 'lre',
 Myxa Corporation                        '.r', 'h ', 'uj', 'p ', 'ts';

Reply via email to