Oh, and whether or not Java regular expressions let you specify ranges of
such characters outside the BMP, I have no idea. I would expect there to
be odd behavior in that area of Java's regular expression implementation,
but haven't done extensive testing myself to find out. I would recommend
tha
Java uses UTF-16 encoding in memory for String objects. Characters in the
Basic Multilingual Plane are represented as a single 16-bit character in
memory, but anything outside the BMP is represented as a sequence of 2
16-bit characters. Clojure's \u syntax can only be used to
directly represent a