# New Ticket Created by  raiph 
# Please include the string:  [perl #125927]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/Ticket/Display.html?id=125927 >


jnthn++ and others are busy with work that is far more important and urgent 
than dealing with this right now. I'm filing this bug now because there are 
reasons to consider addressing it before christmas as explained below.

What I did
==========

say "नि".chars

What I expected
===============

1

What I got
==========

2

-----------------

Some reasons why I think it's appropriate to classify नि as a single grapheme:

1. It's the last of 4 sample single graphemes in the "Extended Grapheme 
Clusters" section of the Unicode Standard Annex #29 on Text Segmentation: 
http://www.unicode.org/reports/tr29/tr29-27.html#Table_Sample_Grapheme_Clusters

(The Unicode standard suggests aiming at Extended Grapheme Clusters at a 
minimum if an implementation wishes to deal with grapheme clusters.)

2. It's the first example in S15: 
https://raw.githubusercontent.com/perl6/specs/master/S15-unicode.pod

3. It behaves as a single unit for selection in my browser. (You too?)

--------

The bug I'm reporting in this RT was discussed briefly today on IRC:

jnthn           m: say "नि".NFC.list.say
camelia  OUTPUT«2344 2367␤True␤»
jnthn    m: say uniprop(2367, 'Canonical_Combining_Class')
camelia  OUTPUT«0␤»
jnthn    ... combiners are identified in the NFG algo by having a CCC > 0

----------

So, presumably, to match Unicode's default extended grapheme cluster 
definition, the CCC > 0 condition is insufficient for identifying combiners, 
including one that's part of a sample grapheme that the Unicode standard saw 
fit to highlight. It's this latter point -- that it's become a go-to example on 
the net -- that's one of the main reasons I'm filing this bug; I don't 
otherwise use Devanagari!

Reply via email to