At 06:13 27/08/2009, Ian Eyberg wrote:
 >I have text that looks like:
 >
 >  'b...@l^@a...@h^@'
 >
 >(most of the time the text is simply 'blah')
 >and then it should come out like this:
 >
 >  'blah'
[...]
 >  UCODE   : '\u0000'{ $channel = HIDDEN; };
 >
 >I'm reading in through antlrinputstream as "UTF8" as I do
 >want to support multi-byte chars and I have rules to help
 >that such as:

I think you're going about this the wrong way.  The input above 
looks like UTF-16; you should detect that case and use a UTF16 
file stream instead of a UTF8 one.  (Normally Unicode files will 
start with a BOM you can use for auto-detection.)

UTF-16 and UTF-8 encode high-order Unicode characters quite 
differently, so if your input can include them then trying to read 
it as UTF8 and just throwing away the nulls definitely isn't going 
to work.


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-interest@googlegroups.com
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to