Re: I need help understanding a match

William Michels via perl6-users Mon, 13 Jan 2025 18:21:15 -0800

Hi Todd, 

First I should apologize for one of my earlier posts. The first token was a bit 
of a jumble. I think now you just want the literal string "download" to start 
your capture.

As per usual I tried a few different approaches to your regex problem, and 
posted what I thought was the best one, However an older iteration crept into 
one of my email posts: it used `^` which is Raku's zero-width "start-of-string" 
regex token. 

If you use `^` you will capture from the start-of-string onward, in this case 
through the `.*?` any-character token and up to the \> angle. You may not want 
this as it actually means the word "download" isn't required for you to capture 
that sequence of characters. 

I'm not sure where you got the impression that `\...\` actually means anything 
specific in Raku. If you're asking for a match against alphanumeric characters 
in Raku you don't have to escape them. Anything else (e.g. punctuation) you'll 
have to escape. So this means if you're trying to match ">" the "greater-than" 
sign (angle), you'll have to escape it  via a backslash (e.g. `\>`), or by 
quoting (e.g. ">"). 

For non-alphanumeric characters, an unescaped punctuation characters is 
reserved for special "metacharacter" purposes: for example an unescaped "." dot 
means "any-character". You'll also note backslashing used to denote characters 
that are difficult to represent otherwise. Think for example how `\n` means 
newline, `\t` means tab. There are others: `\s` means whitespace, `\h` means 
horizontal-whitespace, and `\v` means vertical whitespace. Also `\S` means 
non-whitespace, `\H` means non- horizontal-whitespace, and `\V` means non- 
vertical-whitespace.

I've also posted direct links to Raku regex forms, such as `<?before ... >` (a 
positive lookahead) and `<?after ... >` (a positive lookbehind). You can try 
this in the REPL: 

[0] > my $a = "XYZ"
XYZ
[1] > say $a ~~ m/ <?after X > Y <?before Z > /;
｢Y｣

Try reading that out loud in English, "say $a smartmatching against a requested 
`m` match comprising after-X, Y, before-Z". If you read it that way, you'll 
understand why only the `｢Y｣` ends up in the match variable. You can also 
`andthen` the smartmatch, which will put the match in the `$_` topic variable 
for you, which can help with stringification:

[1] > $a ~~ m/ <?after X > Y <?before Z > / andthen put $_.Str;
Y

I'll try to go through and correct what you wrote below. Best, Bill.

> On Jan 12, 2025, at 03:11, ToddAndMargo via perl6-users 
> <perl6-us...@perl.org> wrote:
> 
> Hi Bill,
> 
> Please correct my notes.
> 
> Many thanks,
> -T
> 
> 
> 
> Explanation:
> my @y = $x ~~ m:g/ <?before ^ | download > .*? <?before \> | \h+ > /;
> 
> `m:g`     # match and global
CORRECT
> `\...\`   # the constrains (beginning and end) of the match
NO, backslashes are used to escape non-alphanumeric characters, denote 
invisible characters (e.g. `\n`), etc. 
> `<...>`   # constraints of instructions inside the match
NO, `<?after ... >` is a lookbehind and `<?before ... >` is a lookahead.
> 
> 
> 
> First instruction: `<?before ^ | download >`
NO, this should just be the literal string `download` (or `"download"`)
> 
>  `?download ^`   # positive look-behind, match but don`t capture `download `
>                  # `^` means "look behind"
> 
>  `|`             # This is logical "OR"
> 
>  `download `     # positive look-behind, match but don`t capture `download `
> 
>   summary: capture everything behind `before ` or capture just `download`
> 
> 
> Second instruction: `.*?`
>   `.*?`       # any-character, one-or-more, frugal up to the third instruction
YES, CORRECT
> 
> 
> Third instruction: `<?before \> | \h+ >`
NO, SIMPLIFY THIS TO  `<?before \>` and the match will stop when it encounters  
">" the "greater-than" sign (angle). Because you're using a lookahead (match 
characters and "lookahead" to find a pattern but don't capture, example ), the 
">" angle doesn't get captured.
> 
>   `<?before \>`  # positive look-ahead, match but don`t capture `download \>`
KINDA, the actual construct is `<?before \> >` or (even more readable), 
`<?before ">" >`
>                  # Note that the `\` in `\>` is escaping the `>` and is 
> removing
KINDA, the `\` backslash in front of a non-alphanumeric is a rule in Raku. If 
it isn't backslashed Raku will try to interpret the non-alphanumeric as a 
metacharacter.
>                  # the `>` from the instructions constraints and making is 
> part
>                  # of the match
The unescaped `>` is part of the lookahead/lookbehind construct, either  
`<?after ... >` (lookbehind) or  `<?before ... >` (lookahead).
> 
>  `|`             # This is logical "OR"
YES
> 
>  `\h+ `          # one-or-more horizontal whitespace character
YES
> 
>  summary: capture everything before `before` or one-or-more whitespace 
> characters
KINDA. Match the previous tokens, and stop matching when (before) you find 
one-or-more whitespace characters.
> 
> 

HTH.

Re: I need help understanding a match

Reply via email to