Re: Using Flex/Lex in a Cocoa project

Ricky Sharp Mon, 18 Aug 2008 13:56:02 -0700


On Aug 18, 2008, at 3:40 PM, mm w wrote:

to avoid the splitting problem

(c < 128) ? "%c" : "\\u%04x", c);


I'm not sure what this solves.

Per Michael's e-mail below, this is indeed a difficult problem. UTF-8is just a particular scheme to store Unicode strings. Operating onindividual bytes in such streams will most likely not make any sense.

What I would do is pick some normalized form and operate on thatdata. For a recent feature at my day job, we normalized all input CSVfiles to UTF-16BE. We were able to handle all of our customer data sofar. The final solution still isn't 100% Unicode-savvy (e.g. it doescrap-out with surrogate pairs), but we have unit tests to expose/document such limitations. And, customer data doesn't yet have suchthings.

On Sat, Aug 16, 2008 at 7:43 AM, Michael Ash <[EMAIL PROTECTED]>wrote:

- It's very difficult to split UTF-8 strings correctly. If you
encounter a run of non-ASCII characters, ensure that you follow that
run through the end, until you get back to ASCII. Don't have a regex
that stops in the middle of it and then expects your code to be able
to do something useful with it.


___________________________________________________________
Ricky A. Sharp         mailto:[EMAIL PROTECTED]
Instant Interactive(tm)   http://www.instantinteractive.com



_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Re: Using Flex/Lex in a Cocoa project

Reply via email to