Hi Marc (and Bruce)!

I'm adapting a "word frequency" answer posted by Sean McAfee on this list.
The key seems to be adding the `:exhaustive` adverb to the `match` call.
AFAIK comb will not accept this adverb, so `match will have to do for now:

Sample Input (including quotes):  “A horse, a horse, my kingdom for a
horse!”

~$ raku -e '++(my %digraphs){$_} for slurp.lc.match(:global, :exhaustive,
/<[a..z]>**2/); .say for %digraphs.sort(-*.value);'

Sample Output:

or    1 => 4
se    1 => 3
rs    1 => 3
ho    1 => 3
in    1 => 1
my    1 => 1
om    1 => 1
ki    1 => 1
ng    1 => 1
do    1 => 1
gd    1 => 1
fo    1 => 1

HTH, Bill.


On Sat, Aug 27, 2022 at 10:25 AM Bruce Gray <bruce.g...@acm.org> wrote:

>
>
> > On Aug 27, 2022, at 10:56 AM, Marc Chantreux <m...@unistra.fr> wrote:
>
> --snip--
>
> > but I think it is possible to move the cursor backward in the comb regex.
>
> --snip--
>
> I do *not* think you can ("move the cursor backward in the comb regex");
> See https://docs.raku.org/routine/comb :
>         ... "returns a Seq of non-overlapping matches" ...
> The "non-overlapping" nature is the problem.
> (Please let me know if this turns out to be incorrect!)
>
> In foresight, Raku has added an optional `:exhaustive` flag to regex
> matching, and that will do what you want.
> This Raku code:
>
>         my %digraphs = slurp.lc.match(:exhaustive, /(<[a..z]> **
> 2)/)».Str.Bag;
>         .say for %digraphs.sort({ -.value, ~.key });
>
> , produces output identical to this Perl code:
>
>     perl -lnE '
>         END { say "$_ => $digraph{$_}" for
>             sort { $digraph{$b} <=> $digraph{$a} || $a cmp $b }
>             keys %digraph
>         }
>         $_=lc; while (/([a-z]{2})/g) {++$digraph{$1}; --pos; }
>     ' Camelia.svg
>
> , when run against a downloaded copy of our mascot:
>         https://upload.wikimedia.org/wikipedia/commons/8/85/Camelia.svg
>
> --
> Hope this helps,
> Bruce Gray (Util of PerlMonks)
>
>

Reply via email to