Re: Parsing Unicode character with Clojure

2015-08-09 Thread Andy Fingerhut
Oh, and whether or not Java regular expressions let you specify ranges of such characters outside the BMP, I have no idea. I would expect there to be odd behavior in that area of Java's regular expression implementation, but haven't done extensive testing myself to find out. I would recommend tha

Re: Parsing Unicode character with Clojure

2015-08-09 Thread Andy Fingerhut
Java uses UTF-16 encoding in memory for String objects. Characters in the Basic Multilingual Plane are represented as a single 16-bit character in memory, but anything outside the BMP is represented as a sequence of 2 16-bit characters. Clojure's \u syntax can only be used to directly represent a