Hi Peter,

On Sep 30, 2008, at 7:58 AM, Peter Edberg wrote:
CFStringGetRangeOfComposedCharactersAtIndex and -[NSString rangeOfComposedCharacterSequenceAtIndex:] are the modern replacements for UCFindTextBreak with kUCTextBreakClusterMask and indeed they now are closer to the original intent of kUCTextBreakClusterMask that the current implementation of kUCTextBreakClusterMask is (since UCFindTextBreak was converted to follow Unicode/ICU default text segmentation rules).

The modern functions treat all of the following as a cluster:
- A surrogate pair (of course, since it is a single character);
- A base character followed by a sequence of combining marks (whether or not this is something that would be composed under NFC);
- A Hangul syllable expressed as a sequence of conjoining jamo;
- An Indic consonant cluster such as consonant + virama + consonant + vowel matra. It is this latter cluster that is no longer treated as a single entity by UCFindTextBreak with kUCTextBreakClusterMask.

Ok, understood. This looks good. Based on the discussion I have updated my bug report 6253075. I think a "convenience" method that returns the cluster count would be very useful as it is probably faster than if we manually role a counter method using repeated calls to rangeOfComposedCharacterSequenceAtIndex and because it will, by its simple availability, reduce some of the confusion that I sense on this list as to what the most appropriate way is to count "characters". There would be "length" to count the number of UTF-16 units and a "numberOfCharacters" to count the clusters that are closest to the human conception of characters.

Thanks,

david.
_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to