Re: Working with a regex using positional captures stored in a variable

Joseph Brenner Wed, 17 Mar 2021 10:42:10 -0700

And once again, thanks much for the explication of all this...
But even after thinking it over, the current state-of-affairs on
this really doesn't strike me as being okay.


As I'm sure everyone here knows, over in perl-land the main trick
you have for creating regexes from components is lexical
interpolation, so something like this:

    my $r1 = qr{ (\d+) }x;
    my $r2 = qr{ (\w+) }x;
    $str =~ m/$r1 \s+ $r2/x;

behaves exactly the same as

   $str =~ m/ (\d+) \s+ (\w+) /x;

A direct translation of this approach to Raku doesn't really
work:

    my $r1 = rx{ (\d+) };
    my $r2 = rx{ (\w+) };
    $str ~~ m/<$r1> \s+ <$r2>/;

And it doesn't work in a potentially insidious way: it can *look*
like it's working and it certainly doesn't throw any warnings.
You might use it for some time before noticing there's a feature
missing.

So as is, /<$regex>/ construct treats the contents of $regex as a
regex-- *except* that it ignores some key features of regexes. It
silently throws away some information.

Now, it is true that there are other ways of doing regex
composition in Raku that work much better, but I don't think
that's really the issue: more than one way to do it is fine as
long as they all actually work.

> I would be more likely to accept <=$pattern> being added as a synonym to 
> <pattern=$pattern>.

That could be an improvement. I was thinking something like
<:$pattern>, in analogy to colon pairs.

(At the very least: this alternate way would get documented, and
then we'd have to distinguish between it and the other one, and
explain that its missing a feature.)


On 3/13/21, Brad Gilbert <b2gi...@gmail.com> wrote:
> It makes <…> more consistent precisely because <$pattern> doesn't capture.
>
> If the first character inside is anything other than an alpha it doesn't
> capture.
> Which is a very simple description of when it captures.
>
>     <?before …> doesn't capture because of the ｢?｣
>     <!before …> doesn't capture because of the ｢!｣
>     <.ws> doesn't capture because of the ｢.｣
>     <&ws> doesn't capture because of the ｢&｣
>     <$pattern> doesn't capture because of the ｢$｣
>     <$0> doesn't capture because of the ｢$｣
>     <@a> doesn't capture because of the ｢@｣
>     <[…]> doesn't capture because of the ｢[｣
>     <-[…]> doesn't capture because of the ｢-]
>     <:Ll> doesn't capture because of the ｢:｣
>
> For most of those, you don't actually want it to capture.
> With ｢.｣ the whole point is that it doesn't capture.
>
>     <digit> does capture because it starts with an alpha
>     <pattern=$pattern> does capture because it starts with an alpha
>
>     $0 = <$pattern> doesn't capture to $<pattern>, but does capture to $0
>     $<pattern> = <$pattern> captures because of $<pattern> =
>
> It would be a mistake to just make <$pattern> capture.
> Consistency is perhaps Raku's most important feature.
>
> One of the mottos of Raku, is that it is ok to confuse a new programmer, it
> is not ok to confuse an expert.
> An expert in Raku understands the deep fundamental ways that Raku is
> consistent.
> So breaking consistency should be very carefully considered.
>
> In this case, there is very little benefit.
> Even worse, you then have to come up with some new syntax to prevent it
> from capturing when you don't want it to.
> That new syntax wouldn't be as guessible as it currently is. Which again
> would confuse experts.
>
> If anyone seriously suggests such a change, I will vehemently fight to
> prevent it from happening.
>
> I would be more likely to accept <=$pattern> being added as a synonym to
> <pattern=$pattern>.
>
> On Sat, Mar 13, 2021 at 3:30 PM Joseph Brenner <doom...@gmail.com> wrote:
>
>> Thanks much for your answer on this.  I think this is the sort of
>> trick I was looking for:
>>
>> Brad Gilbert<b2gi...@gmail.com> wrote:
>>
>> > You can put it back in as a named
>>
>> >     > $input ~~ / <pattern=$pattern>
>> >     ｢9 million｣
>> >      pattern => ｢9 million｣
>> >       0 => ｢9｣
>> >       1 => ｢million｣
>>
>> That's good enough, I guess, though you need to know about the
>> issue... is there some reason it shouldn't happen automatically,
>> using the variable name to label the captures?
>>
>> I don't think this particular gotcha is all that well
>> documented, though I guess there's a reference to this being a
>> "known trap" in the documentation under "Regex interpolation"--
>> but that's the sort of remark that makes sense only after you know
>> what its talking about.
>>
>> I have to say, my first reaction was something like "if they
>> couldn't get this working right, why did they put it in?"
>>
>>
>> On 3/11/21, Brad Gilbert <b2gi...@gmail.com> wrote:
>> > If you interpolate a regex, it is a sub regex.
>> >
>> > If you have something like a sigil, then the match data structure gets
>> > thrown away.
>> >
>> > You can put it back in as a named
>> >
>> >     > $input ~~ / <pattern=$pattern>
>> >     ｢9 million｣
>> >      pattern => ｢9 million｣
>> >       0 => ｢9｣
>> >       1 => ｢million｣
>> >
>> > Or as a numbered:
>> >
>> >     > $input ~~ / $0 = <$pattern>
>> >     ｢9 million｣
>> >      0 => ｢9 million｣
>> >       0 => ｢9｣
>> >       1 => ｢million｣
>> >
>> > Or put it in as a lexical regex
>> >
>> >     > my regex pattern { (\d+) \s+ (\w+) }
>> >     > $input ~~ / <pattern>  /
>> >     ｢9 million｣
>> >      pattern => ｢9 million｣
>> >       0 => ｢9｣
>> >       1 => ｢million｣
>> >
>> > Or just use it as the whole regex
>> >
>> >     > $input ~~ $pattern # variable
>> >     ｢9 million｣
>> >      0 => ｢9｣
>> >      1 => ｢million｣
>> >
>> >     > $input ~~ &pattern # my regex pattern /…/
>> >     ｢9 million｣
>> >      0 => ｢9｣
>> >      1 => ｢million｣
>> >
>> > On Thu, Mar 11, 2021 at 2:29 AM Joseph Brenner <doom...@gmail.com>
>> wrote:
>> >
>> >> Does this behavior make sense to anyone?  When you've got a regex
>> >> with captures in it, the captures don't work if the regex is
>> >> stashed in a variable and then interpolated into a regex.
>> >>
>> >> Do capture groups need to be defined at the top level where the
>> >> regex is used?
>> >>
>> >> { #  From a code example in the "Parsing" book by Moritz Lenz, p. 48,
>> >> section 5.2
>> >>    my $input = 'There are 9 million bicycles in beijing.';
>> >>    if $input ~~ / (\d+) \s+ (\w+) / {
>> >>        say $0.^name;  # Match
>> >>        say $0;        # ｢9｣
>> >>        say $1.^name;  # Match
>> >>        say $1;        # ｢million｣
>> >>        say $/;
>> >>         # ｢9 million｣
>> >>         #  0 => ｢9｣
>> >>         #  1 => ｢million｣
>> >>    }
>> >> }
>> >>
>> >> say '---';
>> >>
>> >> { # Moving the pattern to var which we interpolate into match
>> >>    my $input = 'There are 9 million bicycles in beijing.';
>> >>    my $pattern = rx{ (\d+) \s+ (\w+) };
>> >>    if $input ~~ / <$pattern> / {
>> >>        say $0.^name;  # Nil
>> >>        say $0;        # Nil
>> >>        say $1.^name;  # Nil
>> >>        say $1;        # Nil
>> >>        say $/;        # ｢9 million｣
>> >>    }
>> >> }
>> >>
>> >> In the second case, the match clearly works, but it behaves as
>> >> though the capture groups aren't there.
>> >>
>> >>
>> >>    raku --version
>> >>
>> >>    Welcome to 𝐑𝐚𝐤𝐮𝐝𝐨™ v2020.10.
>> >>    Implementing the 𝐑𝐚𝐤𝐮™ programming language v6.d.
>> >>
>> >
>>
>

Re: Working with a regex using positional captures stored in a variable

Reply via email to