In #raku it was mentioned that it would be nice to have a $*UNICODE variable of 
sorts that reports back the version, but not sure how that would be from an 
implementation POV.

I'm also late to the discussion, so pardon me jumping back a bit.  Basically, 
ICU is something that lets you quickly add in robust Unicode support.  But it's 
also a swiss army knife and overkill for what Raku generally needs (at 
whichever its implemented in), and also limiting in some ways because you 
become beholden to their structures which as Samantha pointed out, doesn't work 
for MoarVM's approach.  Rolling your own has a lot of advantages.

Beyond UCD and UAC (sorting), everything else really should go into module land 
since they're heavily based on an ever changing and growing CLDR, and even 
then, there can be good arguments made for putting sorting in module space too. 
 For reasons like performance, code clarity, data size, etc, companies have 
rolled their own ICU-like libraries (Google's Closure for JS, TwitterCLDR in 
Ruby, etc) running on the same CLDR data.  In Raku (shameless selfplug), a lot 
is already available in the Intl namespace.  There are actually some very cool 
things that can be done mixing CLDR and Raku like creating new 
character-class-like tokens, or even extending built ins — they just don't have 
any business being near core, just... core-like :-)

Matéu


PS: For understanding some of Samantha's incredible work, her talks at the 
Amsterdam convention are really great, and Perl Weekly has an archive of her 
grant write ups:
  Articles: https://perlweekly.com/a/samantha-mcvey.html
  High End Unicode in Perl 6: https://www.youtube.com/watch?v=Oj_lgf7A2LM
  Unicode Internals of Perl 6: https://www.youtube.com/watch?v=9Vv7nUUDdeA
  

> On Sep 29, 2020, at 3:14 PM, William Michels via perl6-users 
> <perl6-us...@perl.org> wrote:
> 
> Thank you, Samantha!
> 
> An outstanding question is one posed by Joseph Brenner--that
> is--knowing which version of the Unicode standard is supported by
> Raku. I grepped through two files, one called "unicode.c" and the
> other called "unicode_db.c". They're both located in rakudo at:
> /rakudo/rakudo-2020.06/nqp/MoarVM/src/strings/ .
> 
> Below are the first 4 lines of my grep results. As you can see
> (above/below), rakudo-2020.06 supports Unicode12.1.0:
> 
> ~$ raku -ne '.say if .grep(/unicode/)'
> ~/rakudo/rakudo-2020.06/nqp/MoarVM/src/strings/unicode_db.c
> # For terms of use, see http://www.unicode.org/terms_of_use.html
> # The UAXes can be accessed at http://www.unicode.org/versions/Unicode12.1.0/
> From http://unicode.org/copyright.html#Exhibit1 on 2017-11-28:
> Distributed under the Terms of Use in http://www.unicode.org/copyright.html.
> <TRUNCATED>
> 
> It would be really interesting to follow your Unicode work, Samantha.
> The ideas you propose are interesting and everyone hopes for speed
> improvements. Is there any place Raku-uns can go to read
> updates--maybe a grant report, blog, or Github issue? Or maybe right
> here, on the Perl6-Users mailing list? Thanks in advance.
> 
> Best, Bill.
> 
> W. Michels, Ph.D.
> 
> 
> 
> On Sun, Sep 27, 2020 at 4:03 AM Samantha McVey <samant...@posteo.net> wrote:
>> 
>> So MoarVM uses its own database of the UCD. One nice thing is this can
>> probably be faster than calling to the ICU to look up information of each
>> codepoint in a long string. Secondly it implements its own text data
>> structures, so the nice features of the UCD to do that would be difficult to
>> use.
>> 
>> In my opinion, it could make sense to use ICU for things like localized
>> collation (sorting). It also could make sense to use ICU for unicode
>> properties lookup for properties that don't have to do with grapheme
>> segmentation or casing. This would be a lot of work but if something like 
>> this
>> were implemented it would probably happen in the context of a larger
>> rethinking of how we use unicode. Though everything is complicated by that we
>> support lots of complicated regular expressions on different unicode
>> properties. I guess first I'd start by benchmarking the speed of ICU and
>> comparing to the current implementation.
>> 
>> 

Reply via email to