Heads-up: code was correct in my last post, but the output is as follows
(Rakudo v2021.06):

~$ raku -e '++(my %digraphs){$_} for slurp.lc.match(:global, :exhaustive,
/<[a..z]>**2/); .say for %digraphs.sort(-*.value);' richard3.txt
or => 4
rs => 3
ho => 3
se => 3
gd => 1
in => 1
fo => 1
om => 1
do => 1
ng => 1
ki => 1
my => 1

On Sat, Aug 27, 2022 at 10:45 AM William Michels <w...@caa.columbia.edu>
wrote:

> Hi Marc (and Bruce)!
>
> I'm adapting a "word frequency" answer posted by Sean McAfee on this list.
> The key seems to be adding the `:exhaustive` adverb to the `match` call.
> AFAIK comb will not accept this adverb, so `match will have to do for now:
>
> Sample Input (including quotes):  “A horse, a horse, my kingdom for a
> horse!”
>
> ~$ raku -e '++(my %digraphs){$_} for slurp.lc.match(:global, :exhaustive,
> /<[a..z]>**2/); .say for %digraphs.sort(-*.value);'
>
> Sample Output:
>
> or    1 => 4
> se    1 => 3
> rs    1 => 3
> ho    1 => 3
> in    1 => 1
> my    1 => 1
> om    1 => 1
> ki    1 => 1
> ng    1 => 1
> do    1 => 1
> gd    1 => 1
> fo    1 => 1
>
> HTH, Bill.
>
>
> On Sat, Aug 27, 2022 at 10:25 AM Bruce Gray <bruce.g...@acm.org> wrote:
>
>>
>>
>> > On Aug 27, 2022, at 10:56 AM, Marc Chantreux <m...@unistra.fr> wrote:
>>
>> --snip--
>>
>> > but I think it is possible to move the cursor backward in the comb
>> regex.
>>
>> --snip--
>>
>> I do *not* think you can ("move the cursor backward in the comb regex");
>> See https://docs.raku.org/routine/comb :
>>         ... "returns a Seq of non-overlapping matches" ...
>> The "non-overlapping" nature is the problem.
>> (Please let me know if this turns out to be incorrect!)
>>
>> In foresight, Raku has added an optional `:exhaustive` flag to regex
>> matching, and that will do what you want.
>> This Raku code:
>>
>>         my %digraphs = slurp.lc.match(:exhaustive, /(<[a..z]> **
>> 2)/)».Str.Bag;
>>         .say for %digraphs.sort({ -.value, ~.key });
>>
>> , produces output identical to this Perl code:
>>
>>     perl -lnE '
>>         END { say "$_ => $digraph{$_}" for
>>             sort { $digraph{$b} <=> $digraph{$a} || $a cmp $b }
>>             keys %digraph
>>         }
>>         $_=lc; while (/([a-z]{2})/g) {++$digraph{$1}; --pos; }
>>     ' Camelia.svg
>>
>> , when run against a downloaded copy of our mascot:
>>         https://upload.wikimedia.org/wikipedia/commons/8/85/Camelia.svg
>>
>> --
>> Hope this helps,
>> Bruce Gray (Util of PerlMonks)
>>
>>

Reply via email to