Re: Working with a regex using positional captures stored in a variable

yary Wed, 17 Mar 2021 14:33:03 -0700

The "Interpolation" section of the raku docs use strings as the elements of
building up a larger regex from smaller pieces, but the example that looks
fruitful isn't working in my raku. This is taken from
https://docs.raku.org/language/regexes#Regex_interpolation


> my $string   = 'Is this a regex or a string: 123\w+False$pattern1 ?';

Is this a regex or a string: 123\w+False$pattern1 ?

> my $regex    = /\w+/;

/\w+/

> say $string.match: / $regex /;

Regex object coerced to string (please use .gist or .raku to do that)
 ... and more error lines, and no result when the docs show matching '123':

｢｣


$ raku -v

Welcome to 𝐑𝐚𝐤𝐮𝐝𝐨™ v2020.10.

Implementing the 𝐑𝐚𝐤𝐮™ programming language v6.d.

Built on MoarVM version 2020.10.


-y


On Wed, Mar 17, 2021 at 3:17 PM William Michels via perl6-users <
perl6-us...@perl.org> wrote:

> Dear Brad,
>
> 1. The list you posted is fantastic ("If the first character inside is
> anything other than an alpha it doesn't capture"). It should be added to
> the Raku Docs ASAP.
>
> 2. There are some shortcuts that don't seem to follow a set pattern. For
> example a named capture can be accessed using $<myname> instead of
> $/<myname> ; the "/' can be elided. Do you have a method you can share for
> remembering these sorts of shortcuts? Or are they disfavored?
>
> > say ~$<myname> if 'abc' ~~ / $<myname> = [ \w+ ] /;
> abc
> >
> [ Above from the example at https://docs.raku.org/syntax/Named%20captures
> ].
>
> 3. Finally, I've never seen in the Perl6/Raku literature the motto you
> cite: "One of the mottos of Raku, is that it is ok to confuse a new
> programmer, it is not ok to confuse an expert." Do you have a citation?
>
> [ The motto I prefer is from Larry Wall: "...easy things should stay easy,
> hard things should get easier, and impossible things should get hard... ."
> Citation: https://www.perl.com/pub/2000/10/23/soto2000.html/ ].
>
> Best Regards,
>
> Bill.
>
>
>
> On Sat, Mar 13, 2021 at 4:47 PM Brad Gilbert <b2gi...@gmail.com> wrote:
>
>> It makes <…> more consistent precisely because <$pattern> doesn't capture.
>>
>> If the first character inside is anything other than an alpha it doesn't
>> capture.
>> Which is a very simple description of when it captures.
>>
>>     <?before …> doesn't capture because of the ｢?｣
>>     <!before …> doesn't capture because of the ｢!｣
>>     <.ws> doesn't capture because of the ｢.｣
>>     <&ws> doesn't capture because of the ｢&｣
>>     <$pattern> doesn't capture because of the ｢$｣
>>     <$0> doesn't capture because of the ｢$｣
>>     <@a> doesn't capture because of the ｢@｣
>>     <[…]> doesn't capture because of the ｢[｣
>>     <-[…]> doesn't capture because of the ｢-]
>>     <:Ll> doesn't capture because of the ｢:｣
>>
>> For most of those, you don't actually want it to capture.
>> With ｢.｣ the whole point is that it doesn't capture.
>>
>>     <digit> does capture because it starts with an alpha
>>     <pattern=$pattern> does capture because it starts with an alpha
>>
>>     $0 = <$pattern> doesn't capture to $<pattern>, but does capture to $0
>>     $<pattern> = <$pattern> captures because of $<pattern> =
>>
>> It would be a mistake to just make <$pattern> capture.
>> Consistency is perhaps Raku's most important feature.
>>
>> One of the mottos of Raku, is that it is ok to confuse a new programmer,
>> it is not ok to confuse an expert.
>> An expert in Raku understands the deep fundamental ways that Raku is
>> consistent.
>> So breaking consistency should be very carefully considered.
>>
>> In this case, there is very little benefit.
>> Even worse, you then have to come up with some new syntax to prevent it
>> from capturing when you don't want it to.
>> That new syntax wouldn't be as guessible as it currently is. Which again
>> would confuse experts.
>>
>> If anyone seriously suggests such a change, I will vehemently fight to
>> prevent it from happening.
>>
>> I would be more likely to accept <=$pattern> being added as a synonym to
>> <pattern=$pattern>.
>>
>> On Sat, Mar 13, 2021 at 3:30 PM Joseph Brenner <doom...@gmail.com> wrote:
>>
>>> Thanks much for your answer on this.  I think this is the sort of
>>> trick I was looking for:
>>>
>>> Brad Gilbert<b2gi...@gmail.com> wrote:
>>>
>>> > You can put it back in as a named
>>>
>>> >     > $input ~~ / <pattern=$pattern>
>>> >     ｢9 million｣
>>> >      pattern => ｢9 million｣
>>> >       0 => ｢9｣
>>> >       1 => ｢million｣
>>>
>>> That's good enough, I guess, though you need to know about the
>>> issue... is there some reason it shouldn't happen automatically,
>>> using the variable name to label the captures?
>>>
>>> I don't think this particular gotcha is all that well
>>> documented, though I guess there's a reference to this being a
>>> "known trap" in the documentation under "Regex interpolation"--
>>> but that's the sort of remark that makes sense only after you know
>>> what its talking about.
>>>
>>> I have to say, my first reaction was something like "if they
>>> couldn't get this working right, why did they put it in?"
>>>
>>>
>>> On 3/11/21, Brad Gilbert <b2gi...@gmail.com> wrote:
>>> > If you interpolate a regex, it is a sub regex.
>>> >
>>> > If you have something like a sigil, then the match data structure gets
>>> > thrown away.
>>> >
>>> > You can put it back in as a named
>>> >
>>> >     > $input ~~ / <pattern=$pattern>
>>> >     ｢9 million｣
>>> >      pattern => ｢9 million｣
>>> >       0 => ｢9｣
>>> >       1 => ｢million｣
>>> >
>>> > Or as a numbered:
>>> >
>>> >     > $input ~~ / $0 = <$pattern>
>>> >     ｢9 million｣
>>> >      0 => ｢9 million｣
>>> >       0 => ｢9｣
>>> >       1 => ｢million｣
>>> >
>>> > Or put it in as a lexical regex
>>> >
>>> >     > my regex pattern { (\d+) \s+ (\w+) }
>>> >     > $input ~~ / <pattern>  /
>>> >     ｢9 million｣
>>> >      pattern => ｢9 million｣
>>> >       0 => ｢9｣
>>> >       1 => ｢million｣
>>> >
>>> > Or just use it as the whole regex
>>> >
>>> >     > $input ~~ $pattern # variable
>>> >     ｢9 million｣
>>> >      0 => ｢9｣
>>> >      1 => ｢million｣
>>> >
>>> >     > $input ~~ &pattern # my regex pattern /…/
>>> >     ｢9 million｣
>>> >      0 => ｢9｣
>>> >      1 => ｢million｣
>>> >
>>> > On Thu, Mar 11, 2021 at 2:29 AM Joseph Brenner <doom...@gmail.com>
>>> wrote:
>>> >
>>> >> Does this behavior make sense to anyone?  When you've got a regex
>>> >> with captures in it, the captures don't work if the regex is
>>> >> stashed in a variable and then interpolated into a regex.
>>> >>
>>> >> Do capture groups need to be defined at the top level where the
>>> >> regex is used?
>>> >>
>>> >> { #  From a code example in the "Parsing" book by Moritz Lenz, p. 48,
>>> >> section 5.2
>>> >>    my $input = 'There are 9 million bicycles in beijing.';
>>> >>    if $input ~~ / (\d+) \s+ (\w+) / {
>>> >>        say $0.^name;  # Match
>>> >>        say $0;        # ｢9｣
>>> >>        say $1.^name;  # Match
>>> >>        say $1;        # ｢million｣
>>> >>        say $/;
>>> >>         # ｢9 million｣
>>> >>         #  0 => ｢9｣
>>> >>         #  1 => ｢million｣
>>> >>    }
>>> >> }
>>> >>
>>> >> say '---';
>>> >>
>>> >> { # Moving the pattern to var which we interpolate into match
>>> >>    my $input = 'There are 9 million bicycles in beijing.';
>>> >>    my $pattern = rx{ (\d+) \s+ (\w+) };
>>> >>    if $input ~~ / <$pattern> / {
>>> >>        say $0.^name;  # Nil
>>> >>        say $0;        # Nil
>>> >>        say $1.^name;  # Nil
>>> >>        say $1;        # Nil
>>> >>        say $/;        # ｢9 million｣
>>> >>    }
>>> >> }
>>> >>
>>> >> In the second case, the match clearly works, but it behaves as
>>> >> though the capture groups aren't there.
>>> >>
>>> >>
>>> >>    raku --version
>>> >>
>>> >>    Welcome to 𝐑𝐚𝐤𝐮𝐝𝐨™ v2020.10.
>>> >>    Implementing the 𝐑𝐚𝐤𝐮™ programming language v6.d.
>>> >>
>>> >
>>>
>>

Re: Working with a regex using positional captures stored in a variable

Reply via email to