Oh, and whether or not Java regular expressions let you specify ranges of
such characters outside the BMP, I have no idea. I would expect there to
be odd behavior in that area of Java's regular expression implementation,
but haven't done extensive testing myself to find out. I would recommend
tha
Java uses UTF-16 encoding in memory for String objects. Characters in the
Basic Multilingual Plane are represented as a single 16-bit character in
memory, but anything outside the BMP is represented as a sequence of 2
16-bit characters. Clojure's \u syntax can only be used to
directly represent a
I've come on some trouble when parsing an Unicode character with
Clojure. I know it's likely to be a problem related to Java and not
Clojure itself but I'm looking for a Clojurish solution so that's why
I'm posting it here. FYI, I have a GNU / Linux OS on the top on which
I use emacs 24 in cunjunct