On Wed, Mar 31, 2004 at 09:14:45AM +0200, elazar wrote: > How can I convince flex and bison to do Hebrew (ISO-885908 or UTF8) > lexing or parsing. > The question is divided into two parts, > 1) how can I make him recognize hebrew letters (so that flex won't > spit them away telling me it's not a defined) > 2) how can I represent hebrew text to flex (IE tokenize "\ABA")
IIUC bison/yacc has nothing to do with it. bison uses tokens from (f)lex. The problem here is how to use Hebrew when defining those tokens. ISO-8859-8 should probably be simpler. I saw that flex's man page mentions 8bit, so figure it is 8bit-clean. UTF-8 is more complicated. I quote here the FAQ issues "Can I fake multi-byte character support?" from flex's manual: Flex has in it a widespread assumption that the input is processed one byte at a time. Fixing this is on the to-do list, but is involved, so it won't happen any time soon. In the interim, the best I can suggest (unless you want to try fixing it yourself) is to write your rules in terms of pairs of bytes, using definitions in the first section: X \xfe\xc2 ... %% foo{X}bar found_foo_fe_c2_bar(); etc. Definitely a pain - sorry about that. -- Tzafrir Cohen +---------------------------+ http://www.technion.ac.il/~tzafrir/ |vim is a mutt's best friend| mailto:[EMAIL PROTECTED] +---------------------------+ ================================================================= To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]