Re: Parsing Unicode character with Clojure

2015-08-09 Thread Andy Fingerhut
Oh, and whether or not Java regular expressions let you specify ranges of such characters outside the BMP, I have no idea. I would expect there to be odd behavior in that area of Java's regular expression implementation, but haven't done extensive testing myself to find out. I would recommend tha

Re: Parsing Unicode character with Clojure

2015-08-09 Thread Andy Fingerhut
Java uses UTF-16 encoding in memory for String objects. Characters in the Basic Multilingual Plane are represented as a single 16-bit character in memory, but anything outside the BMP is represented as a sequence of 2 16-bit characters. Clojure's \u syntax can only be used to directly represent a

Parsing Unicode character with Clojure

2015-08-09 Thread 良ϖ
I've come on some trouble when parsing an Unicode character with Clojure. I know it's likely to be a problem related to Java and not Clojure itself but I'm looking for a Clojurish solution so that's why I'm posting it here. FYI, I have a GNU / Linux OS on the top on which I use emacs 24 in cunjunct