Re: 32-bit Unicode character literals

samppi Mon, 27 Apr 2009 07:08:10 -0700

I see. Does this mean that, if I expect to handle 32-bit characters,
then I need to consider changing my character-handling functions to
accept sequences of vectors instead?


Also, how does (seq "\ud800\udc00") work? Does it split the character
into two 16-bit characters? In the REPL, it seems to return (\? \?).

On Apr 26, 6:22 pm, "Stephen C. Gilardi" <squee...@mac.com> wrote:
> On Apr 26, 2009, at 7:47 PM, samppi wrote:
>
> > user=> \u10000
> > java.lang.IllegalArgumentException: Invalid unicode character: \u10000
>
> > How would I embed the character as a literal in my Clojure code?
>
> Java characters are (still) 16 bits wide. A single Java character  
> cannot represent the Unicode character you're looking to represent.  
> Since Clojure characters are Java characters, you'll need to do this  
> the way the Java folks do.
>
> I found a blog post about it here:
>
>        
> http://weblogs.java.net/blog/joconner/archive/2004/04/unicode_40_supp...
>
> This is also a good reference:
>
>        http://www.fileformat.info/info/unicode/char/10000/index.htm
>
> This representation as a string from that page does seem to work in  
> Clojure:
>
>         "\ud800\udc00"
>
> --Steve
>
>  smime.p7s
> 3KViewDownload
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: 32-bit Unicode character literals

Reply via email to