[il-antlr-interest: 25462] Re: [antlr-interest] Recognizing 5-th hex digit

Gavin Lambert Wed, 26 Aug 2009 14:32:44 -0700

At 07:35 27/08/2009, Kieran Beltran wrote:
>I have encountered a problem when attempting to recognize two 
>required Standard Z symbols which are "above" the four-hex set 
>recognized by my generated lexer. The two symbols are \u1D538 and 
>\u1D53D.
[...]
>Is the solution to include a fifth digit to be recognized 
>optionally? Could I simply replace line 495 (as below) and add a 
>new fragment
>
>'u' ZDIGIT? XDIGIT XDIGIT XDIGIT XDIGIT


No.  It also depends on the stream encoding.  IIRC the Java target 
at least reads in files as UTF-16.  So there's no "room" in a 
single character to store that single digit.

Instead, you need to encode it as a surrogate pair. \u1D538, for 
example, would be encoded as \uD835\uDD38.


I'm not entirely sure how it works in the C target, which uses 
UTF-32 encoding by default; I've never really needed to use 
characters that high up.


List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"il-antlr-interest" group.
To post to this group, send email to il-antlr-interest@googlegroups.com
To unsubscribe from this group, send email to 
il-antlr-interest+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/il-antlr-interest?hl=en
-~----------~----~----~----~------~----~------~--~---

[il-antlr-interest: 25462] Re: [antlr-interest] Recognizing 5-th hex digit

Reply via email to