Heads-up: code was correct in my last post, but the output is as follows (Rakudo v2021.06):
~$ raku -e '++(my %digraphs){$_} for slurp.lc.match(:global, :exhaustive, /<[a..z]>**2/); .say for %digraphs.sort(-*.value);' richard3.txt or => 4 rs => 3 ho => 3 se => 3 gd => 1 in => 1 fo => 1 om => 1 do => 1 ng => 1 ki => 1 my => 1 On Sat, Aug 27, 2022 at 10:45 AM William Michels <w...@caa.columbia.edu> wrote: > Hi Marc (and Bruce)! > > I'm adapting a "word frequency" answer posted by Sean McAfee on this list. > The key seems to be adding the `:exhaustive` adverb to the `match` call. > AFAIK comb will not accept this adverb, so `match will have to do for now: > > Sample Input (including quotes): “A horse, a horse, my kingdom for a > horse!” > > ~$ raku -e '++(my %digraphs){$_} for slurp.lc.match(:global, :exhaustive, > /<[a..z]>**2/); .say for %digraphs.sort(-*.value);' > > Sample Output: > > or 1 => 4 > se 1 => 3 > rs 1 => 3 > ho 1 => 3 > in 1 => 1 > my 1 => 1 > om 1 => 1 > ki 1 => 1 > ng 1 => 1 > do 1 => 1 > gd 1 => 1 > fo 1 => 1 > > HTH, Bill. > > > On Sat, Aug 27, 2022 at 10:25 AM Bruce Gray <bruce.g...@acm.org> wrote: > >> >> >> > On Aug 27, 2022, at 10:56 AM, Marc Chantreux <m...@unistra.fr> wrote: >> >> --snip-- >> >> > but I think it is possible to move the cursor backward in the comb >> regex. >> >> --snip-- >> >> I do *not* think you can ("move the cursor backward in the comb regex"); >> See https://docs.raku.org/routine/comb : >> ... "returns a Seq of non-overlapping matches" ... >> The "non-overlapping" nature is the problem. >> (Please let me know if this turns out to be incorrect!) >> >> In foresight, Raku has added an optional `:exhaustive` flag to regex >> matching, and that will do what you want. >> This Raku code: >> >> my %digraphs = slurp.lc.match(:exhaustive, /(<[a..z]> ** >> 2)/)».Str.Bag; >> .say for %digraphs.sort({ -.value, ~.key }); >> >> , produces output identical to this Perl code: >> >> perl -lnE ' >> END { say "$_ => $digraph{$_}" for >> sort { $digraph{$b} <=> $digraph{$a} || $a cmp $b } >> keys %digraph >> } >> $_=lc; while (/([a-z]{2})/g) {++$digraph{$1}; --pos; } >> ' Camelia.svg >> >> , when run against a downloaded copy of our mascot: >> https://upload.wikimedia.org/wikipedia/commons/8/85/Camelia.svg >> >> -- >> Hope this helps, >> Bruce Gray (Util of PerlMonks) >> >>