Hi Marc (and Bruce)! I'm adapting a "word frequency" answer posted by Sean McAfee on this list. The key seems to be adding the `:exhaustive` adverb to the `match` call. AFAIK comb will not accept this adverb, so `match will have to do for now:
Sample Input (including quotes): “A horse, a horse, my kingdom for a horse!” ~$ raku -e '++(my %digraphs){$_} for slurp.lc.match(:global, :exhaustive, /<[a..z]>**2/); .say for %digraphs.sort(-*.value);' Sample Output: or 1 => 4 se 1 => 3 rs 1 => 3 ho 1 => 3 in 1 => 1 my 1 => 1 om 1 => 1 ki 1 => 1 ng 1 => 1 do 1 => 1 gd 1 => 1 fo 1 => 1 HTH, Bill. On Sat, Aug 27, 2022 at 10:25 AM Bruce Gray <bruce.g...@acm.org> wrote: > > > > On Aug 27, 2022, at 10:56 AM, Marc Chantreux <m...@unistra.fr> wrote: > > --snip-- > > > but I think it is possible to move the cursor backward in the comb regex. > > --snip-- > > I do *not* think you can ("move the cursor backward in the comb regex"); > See https://docs.raku.org/routine/comb : > ... "returns a Seq of non-overlapping matches" ... > The "non-overlapping" nature is the problem. > (Please let me know if this turns out to be incorrect!) > > In foresight, Raku has added an optional `:exhaustive` flag to regex > matching, and that will do what you want. > This Raku code: > > my %digraphs = slurp.lc.match(:exhaustive, /(<[a..z]> ** > 2)/)».Str.Bag; > .say for %digraphs.sort({ -.value, ~.key }); > > , produces output identical to this Perl code: > > perl -lnE ' > END { say "$_ => $digraph{$_}" for > sort { $digraph{$b} <=> $digraph{$a} || $a cmp $b } > keys %digraph > } > $_=lc; while (/([a-z]{2})/g) {++$digraph{$1}; --pos; } > ' Camelia.svg > > , when run against a downloaded copy of our mascot: > https://upload.wikimedia.org/wikipedia/commons/8/85/Camelia.svg > > -- > Hope this helps, > Bruce Gray (Util of PerlMonks) > >