On 7 Nov 2009, at 14:17, Ryan Homer wrote:

On 2009-11-06, at 12:42 PM, Clark Cox wrote:

Is "ü" a single character, or two characters?

When you define a string using ü, isn't it stored internally as one UTF-16 code unit (not sure if I'm using the notation correctly), represented as U+00FC (which is one code unit),

No. It could be either U+00FC or the decomposed form U+0075 U+0308. It depends how it has been entered (wherever you enter it). This, incidentally, is one reason that it isn't trivial for the compiler to support character encodings; if your character encoding was ISO-8859-1 (ISO Latin 1) and you entered L"ü" (or @"ü") or similar, should that be represented by the precomposed sequence, or the decomposed sequence? And how about if you convert your source code to some other form where the accent is necessarily represented by a combining character?

You can only really guarantee that you have one or other form by asking for a particular canonical form; NSString has methods for that (e.g. -precomposedStringWithCanonicalMapping), but of course not all composed character sequences can be represented with precomposed characters in any case, and there's still the issue of surrogates, so this wouldn't really solve your problem.

... then you can use
-rangeOfComposedCharacterSequenceAtIndex: to find the range of indices
(representing a single "character") that contain the given index.

THANKS! This solves my problem.

If you don't already have it, it's a good idea if you're going to get into text processing with Cocoa to grab yourself a copy of the Unicode book, and maybe (since the Unicode book itself is pretty dry) a companion such as Richard Gillam's Unicode Demystified

<http://www.amazon.com/Unicode-Standard-Version-5-0-5th/dp/ 0321480910> <http://www.amazon.com/Unicode-Demystified-Practical-Programmers-Encoding/dp/0201700522 >

(Of course, you can download chapters from the Unicode book from unicode.org . Personally I like having a hard copy---though it *is* a huge tome...)

Kind regards,

Alastair.

--
http://alastairs-place.net



_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to