On Fri, Aug 15, 2008 at 10:53 PM, John Joyce <[EMAIL PROTECTED]> wrote: > Right now, I'm toying with using Flex/Lex in a Cocoa project. > Unfortunately, I don't see a reliable or easy way to handle NSStrings > correctly all the time with Flex. > Does anybody have any suggestions for such text handling and reliable > unicode aware regexes? > I'm seriously not interested in implementing such details in C with Flex. > Flex is fast and cool for that, but if it's going to be stupidly difficult > to use reliably with other languages on a mac, it's not a good idea for me.
Depending on exactly what you need, unicode awareness can be fairly straightforward. Commonly, unicode in regexes is only needed to pass through undifferentiated blobs of text, with ASCII delimiters. For example, imagine parsing a CSV file which potentially has unicode text inside the quotes. For this case, you can convert the file to UTF-8, and then constructs like . will accept them. All non-ASCII characters in UTF-8 are represented as bytes 128-255, so if you just pass those through then you'll be fine. But be aware of some potential problem areas: - Each non-ASCII character will be more than one byte, and flex will think of it as more than one character. Write your regexes accordingly. In particular, avoid length limits on runs of arbitrary characters, and avoid using non-ASCII characters directly in your regex. - It's very difficult to split UTF-8 strings correctly. If you encounter a run of non-ASCII characters, ensure that you follow that run through the end, until you get back to ASCII. Don't have a regex that stops in the middle of it and then expects your code to be able to do something useful with it. - If you need to do something with non-ASCII characters besides read them in one side and write them out the other, for example doing something special with all accented characters, then Flex is probably not the right answer. Besides this it ought to be pretty straightforward. Since Flex just passes your code straight through to the compiler, you can write Objective-C in the actions (as long as you compile the result as Objective-C, of course!), convert the text from UTF-8 back to an NSString, and take things from there. Mike _______________________________________________ Cocoa-dev mailing list (Cocoa-dev@lists.apple.com) Please do not post admin requests or moderator comments to the list. Contact the moderators at cocoa-dev-admins(at)lists.apple.com Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]