Re: I need help understanding a match

Todd Chester via perl6-users Thu, 16 Jan 2025 01:42:03 -0800

Thank you!


On 1/13/25 18:20, William Michels via perl6-users wrote:

Hi Todd,

First I should apologize for one of my earlier posts. The first token was a bit of a 
jumble. I think now you just want the literal string "download" to start your 
capture.

As per usual I tried a few different approaches to your regex problem, and posted what I 
thought was the best one, However an older iteration crept into one of my email posts: it 
used `^` which is Raku's zero-width "start-of-string" regex token.

If you use `^` you will capture from the start-of-string onward, in this case through the 
`.*?` any-character token and up to the \> angle. You may not want this as it actually 
means the word "download" isn't required for you to capture that sequence of 
characters.

I'm not sure where you got the impression that `\...\` actually means anything specific in Raku. If you're asking for 
a match against alphanumeric characters in Raku you don't have to escape them. Anything else (e.g. punctuation) you'll 
have to escape. So this means if you're trying to match ">" the "greater-than" sign (angle), 
you'll have to escape it  via a backslash (e.g. `\>`), or by quoting (e.g. ">").

For non-alphanumeric characters, an unescaped punctuation characters is reserved for special 
"metacharacter" purposes: for example an unescaped "." dot means 
"any-character". You'll also note backslashing used to denote characters that are difficult to 
represent otherwise. Think for example how `\n` means newline, `\t` means tab. There are others: `\s` means 
whitespace, `\h` means horizontal-whitespace, and `\v` means vertical whitespace. Also `\S` means 
non-whitespace, `\H` means non- horizontal-whitespace, and `\V` means non- vertical-whitespace.

I've also posted direct links to Raku regex forms, such as `<?before ... >` (a 
positive lookahead) and `<?after ... >` (a positive lookbehind). You can try this in 
the REPL:

[0] > my $a = "XYZ"
XYZ
[1] > say $a ~~ m/ <?after X > Y <?before Z > /;
｢Y｣

Try reading that out loud in English, "say $a smartmatching against a requested `m` 
match comprising after-X, Y, before-Z". If you read it that way, you'll understand 
why only the `｢Y｣` ends up in the match variable. You can also `andthen` the smartmatch, 
which will put the match in the `$_` topic variable for you, which can help with 
stringification:

[1] > $a ~~ m/ <?after X > Y <?before Z > / andthen put $_.Str;
Y

I'll try to go through and correct what you wrote below. Best, Bill.

On Jan 12, 2025, at 03:11, ToddAndMargo via perl6-users <perl6-us...@perl.org> 
wrote:

Hi Bill,

Please correct my notes.

Many thanks,
-T



Explanation:
my @y = $x ~~ m:g/ <?before ^ | download > .*? <?before \> | \h+ > /;

`m:g`     # match and global

CORRECT

`\...\`   # the constrains (beginning and end) of the match

NO, backslashes are used to escape non-alphanumeric characters, denote 
invisible characters (e.g. `\n`), etc.

`<...>`   # constraints of instructions inside the match

NO, `<?after ... >` is a lookbehind and `<?before ... >` is a lookahead.




First instruction: `<?before ^ | download >`

NO, this should just be the literal string `download` (or `"download"`)


  `?download ^`   # positive look-behind, match but don`t capture `download `
                  # `^` means "look behind"

  `|`             # This is logical "OR"

  `download `     # positive look-behind, match but don`t capture `download `

   summary: capture everything behind `before ` or capture just `download`


Second instruction: `.*?`
   `.*?`       # any-character, one-or-more, frugal up to the third instruction

YES, CORRECT



Third instruction: `<?before \> | \h+ >`

NO, SIMPLIFY THIS TO  `<?before \>` and the match will stop when it encounters  ">" the "greater-than" 
sign (angle). Because you're using a lookahead (match characters and "lookahead" to find a pattern but don't capture, 
example ), the ">" angle doesn't get captured.


   `<?before \>`  # positive look-ahead, match but don`t capture `download \>`

KINDA, the actual construct is `<?before \> >` or (even more readable), `<?before 
">" >`

                  # Note that the `\` in `\>` is escaping the `>` and is 
removing

KINDA, the `\` backslash in front of a non-alphanumeric is a rule in Raku. If 
it isn't backslashed Raku will try to interpret the non-alphanumeric as a 
metacharacter.

                  # the `>` from the instructions constraints and making is part
                  # of the match

The unescaped `>` is part of the lookahead/lookbehind construct, either  `<?after ... 
>` (lookbehind) or  `<?before ... >` (lookahead).


  `|`             # This is logical "OR"

YES


  `\h+ `          # one-or-more horizontal whitespace character

YES


  summary: capture everything before `before` or one-or-more whitespace 
characters

KINDA. Match the previous tokens, and stop matching when (before) you find 
one-or-more whitespace characters.


HTH.

Re: I need help understanding a match

Reply via email to