On Fri, Jan 30, 2009 at 03:30:02AM -0800, Darren Duncan wrote:
> pugs-comm...@feather.perl6.nl wrote:
>>  In the abstract, Perl is written in Unicode, and has consistent Unicode
>> -semantics regardless of the underlying text representations.
>> +semantics regardless of the underlying text representations.  By default
>> +Perl presents Unicode in "NFG" formation, where each grapheme counts as
>> +one character.  A grapheme is what the novice user would think of as a
>> +character in their normal everyday life, including any diacritics.
>
> What's with this NFG / Normal Form G that you refer to?  I don't see any 
> mention of that in http://unicode.org/reports/tr15/ ... did you mean NFC?

Nope, this is a Perl/Parrot idea.  It started out with a notion of mine a
year ago.  Search for 'grapheme' in

    http://use.perl.org/~chromatic/journal/35461

We named it NFG about the time Simon Cozens wrote a PDD for it for parrot.
At the moment it's much better specced in Parrotland than in P6land.  See

    http://www.parrotcode.org/docs/pdd/pdd28_strings.html

NFG stands for Normalization Form G, where the G is short for
"grapheme".  And before anyone asks, yes, we were aware of the other
gloss for NFG when we picked it.  :)

> For that matter, is it possible for all realistic combinations of 
> diacritics and base letters to be represented by a single Unicode 
> codepoint, including all language-dependent graphemes?

No, that is the vision of NFC, but there are potentially an infinite number of
graphemes that can be composed in Unicode.  NFG aims to represent each
of those locally as a single integer, and translate back out to a more
standard normalization form on output.

> I thought NFC sort of did one codepoint per grapheme but there were a few 
> exceptions ... I could be wrong on that point.

You are correct, NFC doesn't do all that we want.

By the way, we could use someone to write the Perl 6 Unicode synopsis,
based on PDD 28.

Larry

Reply via email to