On Aug 12, 2008, at 8:41 AM, Kyle Sluder wrote:
On Mon, Aug 11, 2008 at 9:30 PM, Deborah Goldsmith <[EMAIL PROTECTED]> wrote:
Anyone who is considering writing code that looks through the contents of an NSString (as opposed to just treating the whole string as a unit) needs to
learn the basics of processing Unicode.

Joel Spolsky has a great primer on just how deep the Unicode rabbit
hole goes, entitled "The Absolute Minimum Every Software Developer
Absolutely, Positively Must Know About Unicode and Character Sets (No
Excuses!)":

http://www.joelonsoftware.com/articles/Unicode.html

That article is missing several concepts which are essential for understanding Unicode; like many programmers, Mr. Spolsky thinks of Unicode as "wide ASCII", which it is not. The article doesn't cover surrogate pairs (the fact that he uses the term UCS-2 instead of UTF-16 shows he's not up to date) or combining sequences (grapheme clusters). If you're going to go groveling through Unicode text, you need to understand both.

This article is a bit stuffy, but also more complete, and is even shorter (I think):

http://unicode.org/standard/principles.html

This is also good:

http://icu-project.org/userguide/unicodeBasics.html

Also, Unicode does not, and likely never will, contain the Klingon script. While there was a proposal to encode it, it was rejected due to the fact that the Klingon user community (yes, it exists: http://www.amazon.com/Klingon-Hamlet-Lawrence-Schoen/dp/0671035789/) does not use the script: they write Klingon using ASCII (e.g., "tlhIngan Hol"). Things don't get encoded in Unicode unless there is actually a user community.

That doesn't mean that fictional scripts are prohibited. There are proposals to encode Tengwar and Cirth, for example, as these have (small) user communities. :-)

http://std.dkuug.dk/JTC1/SC2/WG2/docs/n1641/n1641.htm
http://std.dkuug.dk/JTC1/SC2/WG2/docs/n1641/n1641.htm

They've been languishing since 1997 due to more pressing work for the Unicode Technical Committee, so I wouldn't plan on writing Quenya or Sindarin in Unicode any time soon...

Deborah Goldsmith
Apple Inc.
[EMAIL PROTECTED]

_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to