>
>> What exactly is matched by \g and \G is controlled by two new special
>> variables, @^g and @^G, which are arrays of strings.
>
>These sorts of global variables have been a problem in the past.
>Since they change the meaning of the \g and \G escapes, I think they
>should be pragmas or some other declaration that has a lexical scope.
Good point. Something like:
use pairs '(' => ')', '{' => '}', ...;
perhaps? And it's a compile-time error to have an odd number in that list.
>(Also, the \G escape already has a meaning in Perl 5, so it would
>probably be better to think of some other name.)
Gah, yes. I completely forgot about \G. Someone else pointed this out
to me, too.
How about \p and \P ("P" for "pairwise groupings" or just "pairs")?
>The big problem I see that you didn't address is that you didn't say
>what would happen when the target string contains mismatched
>parentheses.
>[...]
>Now suppose the string were
>
> $string = "(b - a + 1] * 7)";
> $string =~ /\g.*?\G/;
>
>Now what happens here? \g matches "(" and sets up \G so that \G will
>only match the corresponding ")". Then what? I'm not sure from your
>proposal.
>
>Your later example (in the 'implementation' section) suggests that '['
>and ']' are ignored once \g matches a '('. If that is true, then in
>the example above, the .*? would match "bb - a + 1] * 7". I think
>this won't be what people will want from \g...\G. We will still going
>to get a lot of questions from people asking how to tell if the
>delimiters in a string are balanced.
Here's my thinking: If you're parsing a string that contains possibly
nested parentheses, you'll want to find what's between the outermost
parentheses first, then recursively check that found text for parentheses.
In your mismatch-brace example, "(b - a + 1] * 7)", The \g and \G
set (or \p and \P, if we go that way) match the parentheses "(" and ")".
The .*? matches "b - a + 1] * 7", as you say. The RE has succesfully
found a pair of brackets and their contents.
I was thinking that the next logical thing for the programmer to do
is to recursively check the found string "b - a + 1] * 7" against the
same (or similar) pattern, which would fail, at which point the programmer
would know there's a syntax error.
>(Site note: I'm not sure why you used .*? here instead of .*, since as
>I understand your proposal, .* would have done the same thing. I
>suggest that you change .*? to .* or else add a remark about why this
>would be different.)
Sorry for any confusion. I should have given another example to
clear this up. Consider the string "(a + b) * (c + d)". The regular
expression /\g.*?\G/ would match "(a + b)", while /\g.*\G/ would match
the whole string. The parentheses are not nested; they are all at the
same nesting level.
>Another ambiguity in your proposal: You want
>
> [\g]
>
>to match any single open delimiter character. But then later on you have
>an example where @^g contains the string "/*". What would [\g] do in
>this case?
Match an instance of "/*" if it appears. What are the implications
of an element of a [...] set matching a multi-character string?
>> As it continues scanning, it encounters the "]" between the "f" and the
>> ")". The \G does not match this "]" character, because the \g must match
>> a ")".
>
>You mean \G here instead of \g, don't you?
Yes. My mistake.
>
>> sub parse
>> {
>> my $string = shift;
>> while ($string =~ /([^\g])*(\g)(.*?)(\G)([^\g\G]*)/g)
>
>Don't you mean ([^\g]*) instead of ([^\g])* here?
Yes, another typo. Thanks for the correction.
----------------------------------------------------------------------
Eric J. Roode, [EMAIL PROTECTED] print scalar reverse sort
Senior Software Engineer 'tona ', 'reh', 'ekca', 'lre',
Myxa Corporation '.r', 'h ', 'uj', 'p ', 'ts';