At 06:13 27/08/2009, Ian Eyberg wrote: >I have text that looks like: > > 'b...@l^@a...@h^@' > >(most of the time the text is simply 'blah') >and then it should come out like this: > > 'blah' [...] > UCODE : '\u0000'{ $channel = HIDDEN; }; > >I'm reading in through antlrinputstream as "UTF8" as I do >want to support multi-byte chars and I have rules to help >that such as:
I think you're going about this the wrong way. The input above looks like UTF-16; you should detect that case and use a UTF16 file stream instead of a UTF8 one. (Normally Unicode files will start with a BOM you can use for auto-detection.) UTF-16 and UTF-8 encode high-order Unicode characters quite differently, so if your input can include them then trying to read it as UTF8 and just throwing away the nulls definitely isn't going to work. List: http://www.antlr.org/mailman/listinfo/antlr-interest Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "il-antlr-interest" group. To post to this group, send email to il-antlr-interest@googlegroups.com To unsubscribe from this group, send email to il-antlr-interest+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/il-antlr-interest?hl=en -~----------~----~----~----~------~----~------~--~---