Hello,

I need to break a string down into individual characters.

In English that's pretty easy.

But in some languages what a user perceives as a single block is actually a base character plus accents plus vowel markers plus tone markers plus...


eg:     เก

is made of

U+0E40 ( เ ) thai character sara e
U+0E01 ( ก ) thai character ko kai


To help with this NSString has the methods:

        rangeOfComposedCharacterSequencesForRange:
        rangeOfComposedCharacterSequenceAtIndex:

and CFString has:

        CFStringGetRangeOfComposedCharactersAtIndex.



but then some languages - like german, will sometimes combine certain blocks together

so SS becomes ß

the document http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries seems to have some good information about this, so I'm not completely lost as to how to proceed. But this strikes me as one of those problems that other people have struck many times before,

any suggestions would be deeply appreciated.

thank you

mathew_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to