> Le 5 avr. 2017 à 07:49, Gerriet M. Denkmann <gerri...@icloud.com> a écrit :
> 
> 
>> On 3 Apr 2017, at 05:58, Aki Inoue <a...@apple.com> wrote:
>> 
>>>> This is the standard Unicode Normalization behavior. Each Unicode 
>>>> character is assigned the Unicode Combining Property, an integer value 
>>>> defining the canonical ordering of combining marks.
>>>> 
>>>> The Unicode Combining Property for THAI CHARACTER SARA UU is 103, and THAI 
>>>> CHARACTER MAI EK 107. So, MAI EK always comes after SARA UU in the 
>>>> canonical order.
>>>> 
>>>> On the other hand, THAI CHARACTER SARA II has the property value 0 which 
>>>> indicates the start of the reordering segment. That’s why the character is 
>>>> not reordered in respect to other Thai combining characters.
> 
> I just read UNICODE NORMALIZATION FORMS <http://unicode.org/reports/tr15/>.
> 
> Under “11.1 Stability of Normalized Forms" it says: “A normalized string is 
> guaranteed to be stable; that is, once normalized, a string is normalized 
> according to all future versions of Unicode.”

> 
> That means: As consonant + tone-mark + top-vowel is normalised in the current 
> Unicode, there is no way to create a better normalisation which would 
> normalise this to: consonant + top-vowel + tone-mark.
> 
> Apple uses (as far as I remember) a variant of Unicode’s canonical 
> decomposition form.

Yes they do. I think is due to lack of backward compatibility for 
normalisation. I don’t think a string normalized in new Unicode version is 
guarantee to have the same normalization in past version.

If new version of HFS start using new Unicode version, it would break the 
filename if you try to mount the drive with an older driver. So HFS keep using 
an old Unicode normalisation.

> So Apple could use a better normalisation.
> 
> But: An existing filename, which is not normalised under the new and better 
> normalisation rules, would become inaccessible. Not good.
> Renormalising all filenames according to the new and better normalisation 
> rules would be probably rather expensive. 
> Also one would need to rename some files, which become identical under the 
> new rules. Kind of messy, but not too much.
> 
> So I do not really see a way out of this problem (created by some 
> questionable decisions of the Unicode people).
> 
> 
> Kind regards,
> 
> Gerriet.
> 
> P.S. A similar problem exists with NIKHAHIT + Sara Aa versus Sara Am. Looks 
> identical, but is not normalised.

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to