On 19 Jan, 12:03, joachim <joachim.de.be...@gmail.com> wrote:
> On Jan 18, 8:23 pm, Andy Fingerhut <andy.finger...@gmail.com> wrote:
>
> > I don't have some code lying around to do that, but I might make one.  The
> > name strings would require several megabytes of storage, but as long as you
> > don't mind that...
>
> I wouldn't mind, but the code your provided is already more than I was
> hoping for, so thanks again!
> Jm

Have you checked out ICU4J?

    Dependency: [com.ibm.icu/icu4j "4.8.1.1"]
    Javadoc: http://icu-project.org/apiref/icu4j/

You can do this with it:

    (require '[clojure.string :as str])
    (import 'com.ibm.icu.lang.UCharacter)

    (defn char-names [s]
      (UCharacter/getName s ", "))

    (defn strip-supplementary [s]
      (str/replace s #"[^\u0000-\uFFFF]+" char-name))

    (strip-supplementary "The first three letters of the Gothic
alphabet are: \uD800\uDF30\uD800\uDF31\uD800\uDF32")
    ;=> "The first three letters of the Gothic alphabet are: GOTHIC
LETTER AHSA, GOTHIC LETTER BAIRKAN, GOTHIC LETTER GIBA"

On 17 Jan, 19:14, Andy Fingerhut <andy.finger...@gmail.com> wrote:
> Rasmus, thanks for that suggestion.  I have seen this regular expression
> before recently for the same purpose, but not an explanation for why it
> matches only supplementary characters.  Do you know, or have you read
> somewhere, a good explanation for that?

Hrm. To be honest I first tried to match characters in the range U
+D800 - U+DFFF, but that didn't work for some reason. I then googled a
bit and found the above regex. It worked so I just used it and didn't
think about it much. It is a bit weird indeed. It gives you the
impression that Java regexes is aware of supplementary characters
somehow. Maybe it would also work to match characters not outside that
range? Like this: #"[^\u0000-\uD7FF\uE000-\uFFFF]". That regex would
be a bit more clear, I think.

// Rasmus

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to