Re: perl6-regex: retaining $/.pos after an unsuccesful match without a temporary variable?

Brad Gilbert Sun, 18 Aug 2019 14:18:38 -0700

Perl6 improved on regexes precicely by not inheriting decades of
accumulated cruft.


Perl (prior to 6) has expanded upon regular expressions in ways it was not
designed for.
(It was not designed to be expanded at all.)
That has lead to hard to guess extensions, because they are not all that
consistent.

Perl6 throws out all of them and starts over with the basics, but makes
small changes so that it is expandable in a consistent way.
It then proceeds to re-add many of the advanced features of Perl5 regexs,
using that redesigned expandability.
(One of those small changes is that all non-alphanumeric characters are
reserved for meta-syntactic features.)

In order to keep the design clean, some of the rarely used and more
esoteric features were relegated to being done using regular Perl6 code.
The whole reason they were done the way they were in Perl5 is because
regexes are a separate language with a separate compiler so you couldn't
really do those things without those extensions.
Since regexs are code in Perl6, using the same compiler, there is no
difference between regex code which does something and Perl6 code which
does the same thing.

On Sun, Aug 18, 2019 at 11:45 AM Aureliano Guedes <
guedes.aureli...@gmail.com> wrote:

> Even being another language, Perl6 should be inheriting Perl5's regexes or
> even improving it not making it uglier and harder.
>
> Or I'm seeing how to use it in an easy way. Also, dunno if there is some
> GOOD documentation to Perl6 regexes.
>
> On Sun, Aug 18, 2019 at 12:56 PM Brad Gilbert <b2gi...@gmail.com> wrote:
>
>> The Perl6 regex system was simplified.
>> Instead you use the rest of Perl6 to implement those features.
>> Both inside and outside of the regexes.
>> (Which means there are fewer esoteric features that are rarely used, and
>> often forgotten or never learned.)
>>
>> It might be best to just store the position at the end of the regex.
>>
>>     use v6
>>     my $test = "      foo bar";
>>     my $pos = 0;
>>
>>     $test ~~ m /^\s+  {$pos = $/.pos}/;
>>     $test ~~ m :pos($pos) /foo\s+  {$pos = $/.pos}/;
>>     $test ~~ m :pos($pos) /willnotmatch  {$pos = $/.pos}/;
>>     $test ~~ m :pos($pos) /bar   {$pos = $/.pos}/;
>>
>>     say $/.pos; # yields "13"
>>
>> If you want something a bit closer:
>>
>> In here `\G` translates to `:pos($/.pos // $pos)` and `/c` translates to
>> `{let $pos = $/.pos}`.
>>
>>     use v6
>>     my $test = "      foo bar";
>>     my $pos = 0;
>>
>>
>>     $test ~~ m /^\s+/;
>>     $test ~~ m :pos($/.pos // $pos) /foo\s+/;
>>     $test ~~ m :pos($/.pos // $pos) /{let $pos = $/.pos} willnotmatch/;
>>     $test ~~ m :pos($/.pos // $pos) /bar/;
>>
>>     say $pos; # yields "13"
>>
>> You could combine ($/.pos // $pos) into a function:
>>
>>     use v6
>>     my $test = "      foo bar";
>>     my $pos = 0;
>>
>>     sub pos () { OUTERS::<$/>.pos // $pos } # may defeat optimizations
>>
>>     $test ~~ m /^\s+/;
>>     $test ~~ m :pos(pos) /foo\s+/;
>>     $test ~~ m :pos(pos) /{let $pos = $/.pos} willnotmatch/;
>>     $test ~~ m :pos(pos) /bar/;
>>
>>     say $pos; # yields "13"
>>
>> Note that in Perl6 regexes are code.
>> (They can have parameters and lexical variables.)
>>
>> Personally I would just combine them into a grammar.
>>
>>     grammar FooBar {
>>         token TOP {
>>             $<space> = \s+
>>             <foo>
>>             [
>>             || <willnotmatch>
>>             || <bar>
>>             ]
>>         }
>>
>>         token foo { foo \s+ }
>>         token willnotmatch { willnotmatch }
>>         token bar { bar }
>>     }
>>
>> The || is to get the Perl5 semantics of trying them in order. It would
>> actually be better to use |.
>>
>> Anyway the result is structured.
>>
>>     say FooBar.parse($test);
>>     ｢      foo bar｣
>>      space => ｢      ｣
>>      foo => ｢foo ｣
>>      bar => ｢bar｣
>>
>> You could get the same structure out of a single regex as well.
>>
>>     $test ~~ m {
>>         ^
>>         $<space> = \s+
>>         $<foo> = [foo \s+]
>>         [
>>         | $<willnotmatch> = willnotmatch
>>         | $<bar> = bar
>>         ]
>>     }
>>
>> I'm not sure if you want to include the space(s) after `foo` into
>> `$/<foo>`.
>> But to keep it the same as the original I used `[]` to group `foo` and
>> `\s+`.
>>
>> On Sun, Aug 18, 2019 at 9:48 AM Raymond Dresens <
>> raymond.dres...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> In the past few days I've been converting some "incremental
>>> parsing"-regex code from perl 5 to perl 6 (I haven't not touched grammars,
>>> yet...)
>>>
>>> In Perl 5 I often used the /g and /c modifiers so that the following
>>> snippet of code doesn't die:
>>>
>>>     # perl5
>>>
>>>     my $test = "      foo bar";
>>>
>>>     die unless $test =~ /^\s+/g;
>>>     die unless $test =~ /\Gfoo\s+/g;
>>>     die if     $test =~ /\Gwillnotmatch/gc; # ...what is the equivalent
>>> of the 'c' modifier in Perl 6?
>>>     die unless $test =~ /\Gbar/g;
>>>
>>>     say pos $test; # yields "13"
>>>
>>> I managed to translate such code in Perl 6, by using the "m:p/.../"
>>> regex,
>>>
>>> When I anticipate an unsuccessful match then I must temporarily store
>>> $/.pos and provide it to the next regex (e.g. "m:p($P)//"), so it seems,
>>>
>>> That works.... but it riddles my current solutions with "$P = $/.pos;"
>>> assignments.
>>>
>>> Is there a way to retain this match like in Perl 5?
>>>
>>> Is there a better way in general?
>>>
>>> ... perhaps it's time to look into grammars? ;)
>>>
>>> Thanks!
>>>
>>> Regards,
>>>
>>> Raymond Dresens.
>>>
>>>
>>>
>>>
>>>
>>>
>
> --
> Aureliano Guedes
> skype: aureliano.guedes
> contato:  (11) 94292-6110
> whatsapp +5511942926110
>

Re: perl6-regex: retaining $/.pos after an unsuccesful match without a temporary variable?

Reply via email to