FWIW Since release 2023.02, there's a Unicode class, with a class method .version:
$ raku -e 'say Unicode.version' v15.0 > On 9 Dec 2024, at 15:36, William Michels <w...@caa.columbia.edu> wrote: > > Nudging this conversation, ...to follow progress since 2020. > > Anyone want to chime it? > > Is a $*UNICODE dynamic variable a possibility? > > Related: I'm re-reading Matéu's comment, which (I think) says to let ICU > live in a module somewhere. > > Best Regards, Bill. > >> On Sep 29, 2020, at 21:19, Matthew Stuckwisch <ma...@softastur.org> wrote: >> >> In #raku it was mentioned that it would be nice to have a $*UNICODE variable >> of sorts that reports back the version, but not sure how that would be from >> an implementation POV. >> >> I'm also late to the discussion, so pardon me jumping back a bit. >> Basically, ICU is something that lets you quickly add in robust Unicode >> support. But it's also a swiss army knife and overkill for what Raku >> generally needs (at whichever its implemented in), and also limiting in some >> ways because you become beholden to their structures which as Samantha >> pointed out, doesn't work for MoarVM's approach. Rolling your own has a lot >> of advantages. >> >> Beyond UCD and UAC (sorting), everything else really should go into module >> land since they're heavily based on an ever changing and growing CLDR, and >> even then, there can be good arguments made for putting sorting in module >> space too. For reasons like performance, code clarity, data size, etc, >> companies have rolled their own ICU-like libraries (Google's Closure for JS, >> TwitterCLDR in Ruby, etc) running on the same CLDR data. In Raku (shameless >> selfplug), a lot is already available in the Intl namespace. There are >> actually some very cool things that can be done mixing CLDR and Raku like >> creating new character-class-like tokens, or even extending built ins — they >> just don't have any business being near core, just... core-like :-) >> >> Matéu >> >> >> PS: For understanding some of Samantha's incredible work, her talks at the >> Amsterdam convention are really great, and Perl Weekly has an archive of her >> grant write ups: >> Articles: https://perlweekly.com/a/samantha-mcvey.html >> High End Unicode in Perl 6: https://www.youtube.com/watch?v=Oj_lgf7A2LM >> Unicode Internals of Perl 6: https://www.youtube.com/watch?v=9Vv7nUUDdeA >> >> >>> On Sep 29, 2020, at 3:14 PM, William Michels via perl6-users >>> <perl6-us...@perl.org> wrote: >>> >>> Thank you, Samantha! >>> >>> An outstanding question is one posed by Joseph Brenner--that >>> is--knowing which version of the Unicode standard is supported by >>> Raku. I grepped through two files, one called "unicode.c" and the >>> other called "unicode_db.c". They're both located in rakudo at: >>> /rakudo/rakudo-2020.06/nqp/MoarVM/src/strings/ . >>> >>> Below are the first 4 lines of my grep results. As you can see >>> (above/below), rakudo-2020.06 supports Unicode12.1.0: >>> >>> ~$ raku -ne '.say if .grep(/unicode/)' >>> ~/rakudo/rakudo-2020.06/nqp/MoarVM/src/strings/unicode_db.c >>> # For terms of use, see http://www.unicode.org/terms_of_use.html >>> # The UAXes can be accessed at >>> http://www.unicode.org/versions/Unicode12.1.0/ >>> From http://unicode.org/copyright.html#Exhibit1 on 2017-11-28: >>> Distributed under the Terms of Use in http://www.unicode.org/copyright.html. >>> <TRUNCATED> >>> >>> It would be really interesting to follow your Unicode work, Samantha. >>> The ideas you propose are interesting and everyone hopes for speed >>> improvements. Is there any place Raku-uns can go to read >>> updates--maybe a grant report, blog, or Github issue? Or maybe right >>> here, on the Perl6-Users mailing list? Thanks in advance. >>> >>> Best, Bill. >>> >>> W. Michels, Ph.D. >>> >>> >>> >>> On Sun, Sep 27, 2020 at 4:03 AM Samantha McVey <samant...@posteo.net> wrote: >>>> >>>> So MoarVM uses its own database of the UCD. One nice thing is this can >>>> probably be faster than calling to the ICU to look up information of each >>>> codepoint in a long string. Secondly it implements its own text data >>>> structures, so the nice features of the UCD to do that would be difficult >>>> to >>>> use. >>>> >>>> In my opinion, it could make sense to use ICU for things like localized >>>> collation (sorting). It also could make sense to use ICU for unicode >>>> properties lookup for properties that don't have to do with grapheme >>>> segmentation or casing. This would be a lot of work but if something like >>>> this >>>> were implemented it would probably happen in the context of a larger >>>> rethinking of how we use unicode. Though everything is complicated by that >>>> we >>>> support lots of complicated regular expressions on different unicode >>>> properties. I guess first I'd start by benchmarking the speed of ICU and >>>> comparing to the current implementation. >>>> >>>> >> >