On Wed, Mar 31, 2004 at 09:14:45AM +0200, elazar wrote:
> How can I convince flex and bison to do Hebrew (ISO-885908 or UTF8)
> lexing or parsing.
> The question is divided into two parts,
> 1) how can I make him recognize hebrew letters (so that flex won't
> spit them away telling me it's not a defined)
> 2) how can I represent hebrew text to flex (IE tokenize "\ABA")

IIUC bison/yacc has nothing to do with it. bison uses tokens from
(f)lex. The problem here is how to use Hebrew when defining those
tokens.

ISO-8859-8 should probably be simpler. I saw that flex's man page
mentions 8bit, so  figure it is 8bit-clean.

UTF-8 is more complicated. I quote here the FAQ issues "Can I fake
multi-byte character support?" from flex's manual:

     Flex has in it a widespread assumption that the input is processed
     one byte at a time.  Fixing this is on the to-do list, but is involved,
     so it won't happen any time soon.  In the interim, the best I can suggest
     (unless you want to try fixing it yourself) is to write your rules in
     terms of pairs of bytes, using definitions in the first section:
     
        X       \xfe\xc2
        ...
        %%
        foo{X}bar       found_foo_fe_c2_bar();
     
     etc.  Definitely a pain - sorry about that.


-- 
Tzafrir Cohen                       +---------------------------+
http://www.technion.ac.il/~tzafrir/ |vim is a mutt's best friend|
mailto:[EMAIL PROTECTED]       +---------------------------+

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to