Hi Todd,
First I should apologize for one of my earlier posts. The first token was a bit of a
jumble. I think now you just want the literal string "download" to start your
capture.
As per usual I tried a few different approaches to your regex problem, and posted what I
thought was the best one, However an older iteration crept into one of my email posts: it
used `^` which is Raku's zero-width "start-of-string" regex token.
If you use `^` you will capture from the start-of-string onward, in this case through the
`.*?` any-character token and up to the \> angle. You may not want this as it actually
means the word "download" isn't required for you to capture that sequence of
characters.
I'm not sure where you got the impression that `\...\` actually means anything specific in Raku. If you're asking for
a match against alphanumeric characters in Raku you don't have to escape them. Anything else (e.g. punctuation) you'll
have to escape. So this means if you're trying to match ">" the "greater-than" sign (angle),
you'll have to escape it via a backslash (e.g. `\>`), or by quoting (e.g. ">").
For non-alphanumeric characters, an unescaped punctuation characters is reserved for special
"metacharacter" purposes: for example an unescaped "." dot means
"any-character". You'll also note backslashing used to denote characters that are difficult to
represent otherwise. Think for example how `\n` means newline, `\t` means tab. There are others: `\s` means
whitespace, `\h` means horizontal-whitespace, and `\v` means vertical whitespace. Also `\S` means
non-whitespace, `\H` means non- horizontal-whitespace, and `\V` means non- vertical-whitespace.
I've also posted direct links to Raku regex forms, such as `<?before ... >` (a
positive lookahead) and `<?after ... >` (a positive lookbehind). You can try this in
the REPL:
[0] > my $a = "XYZ"
XYZ
[1] > say $a ~~ m/ <?after X > Y <?before Z > /;
「Y」
Try reading that out loud in English, "say $a smartmatching against a requested `m`
match comprising after-X, Y, before-Z". If you read it that way, you'll understand
why only the `「Y」` ends up in the match variable. You can also `andthen` the smartmatch,
which will put the match in the `$_` topic variable for you, which can help with
stringification:
[1] > $a ~~ m/ <?after X > Y <?before Z > / andthen put $_.Str;
Y
I'll try to go through and correct what you wrote below. Best, Bill.
On Jan 12, 2025, at 03:11, ToddAndMargo via perl6-users <perl6-us...@perl.org>
wrote:
Hi Bill,
Please correct my notes.
Many thanks,
-T
Explanation:
my @y = $x ~~ m:g/ <?before ^ | download > .*? <?before \> | \h+ > /;
`m:g` # match and global
CORRECT
`\...\` # the constrains (beginning and end) of the match
NO, backslashes are used to escape non-alphanumeric characters, denote
invisible characters (e.g. `\n`), etc.
`<...>` # constraints of instructions inside the match
NO, `<?after ... >` is a lookbehind and `<?before ... >` is a lookahead.
First instruction: `<?before ^ | download >`
NO, this should just be the literal string `download` (or `"download"`)
`?download ^` # positive look-behind, match but don`t capture `download `
# `^` means "look behind"
`|` # This is logical "OR"
`download ` # positive look-behind, match but don`t capture `download `
summary: capture everything behind `before ` or capture just `download`
Second instruction: `.*?`
`.*?` # any-character, one-or-more, frugal up to the third instruction
YES, CORRECT
Third instruction: `<?before \> | \h+ >`
NO, SIMPLIFY THIS TO `<?before \>` and the match will stop when it encounters ">" the "greater-than"
sign (angle). Because you're using a lookahead (match characters and "lookahead" to find a pattern but don't capture,
example ), the ">" angle doesn't get captured.
`<?before \>` # positive look-ahead, match but don`t capture `download \>`
KINDA, the actual construct is `<?before \> >` or (even more readable), `<?before
">" >`
# Note that the `\` in `\>` is escaping the `>` and is
removing
KINDA, the `\` backslash in front of a non-alphanumeric is a rule in Raku. If
it isn't backslashed Raku will try to interpret the non-alphanumeric as a
metacharacter.
# the `>` from the instructions constraints and making is part
# of the match
The unescaped `>` is part of the lookahead/lookbehind construct, either `<?after ...
>` (lookbehind) or `<?before ... >` (lookahead).
`|` # This is logical "OR"
YES
`\h+ ` # one-or-more horizontal whitespace character
YES
summary: capture everything before `before` or one-or-more whitespace
characters
KINDA. Match the previous tokens, and stop matching when (before) you find
one-or-more whitespace characters.
HTH.