glenn andreas wrote:

[wrote about how using regex is not a good idea, particularly with NSString and unicode. Pretty much the same things that Jens wrote earlier.]

Yes, that's all very true. Regex is a poor choice if you're working on non-ASCII text. I'm generally not doing so, but just yesterday did have the unpleasant experience of regexing some UTF16 files. (See another email by me in this thread.)

You could kludge it to work using some options that are available on Mac OS X and FreeBSD regex libraries. (Don't know if it is available elsewhere, but likely is.) Essentially, you tell regcomp to ignore nuls and then you have a lot of fun coding REs that match your UTF16 strings taking into account endianness and all. I've pondered how it would work and am confident that it would work, but also concede that it would be a very ugly hack and be prone to breakage.

One other possible solution is to use the JavaScriptCore and make a JSStringRef (which works with unichars like NSString), and use JavaScript's regex support - that way the results will at least have consistent indices, work well with non-ASCII characters, etc...

That is an excellent option if you're using JavaScriptCore already, or maybe even if you're not. There's another thing to look into. Anyone for a unicode text editor that is scriptable in JavaScript? (Hmm, maybe the world really doesn't need another text editor.) :P

For now, I'm going to look into ICU. I seem to have a couple of copies of it on my computer.

Cheers,
Jason
_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to