On 19 Jan, 12:03, joachim <joachim.de.be...@gmail.com> wrote: > On Jan 18, 8:23 pm, Andy Fingerhut <andy.finger...@gmail.com> wrote: > > > I don't have some code lying around to do that, but I might make one. The > > name strings would require several megabytes of storage, but as long as you > > don't mind that... > > I wouldn't mind, but the code your provided is already more than I was > hoping for, so thanks again! > Jm
Have you checked out ICU4J? Dependency: [com.ibm.icu/icu4j "4.8.1.1"] Javadoc: http://icu-project.org/apiref/icu4j/ You can do this with it: (require '[clojure.string :as str]) (import 'com.ibm.icu.lang.UCharacter) (defn char-names [s] (UCharacter/getName s ", ")) (defn strip-supplementary [s] (str/replace s #"[^\u0000-\uFFFF]+" char-name)) (strip-supplementary "The first three letters of the Gothic alphabet are: \uD800\uDF30\uD800\uDF31\uD800\uDF32") ;=> "The first three letters of the Gothic alphabet are: GOTHIC LETTER AHSA, GOTHIC LETTER BAIRKAN, GOTHIC LETTER GIBA" On 17 Jan, 19:14, Andy Fingerhut <andy.finger...@gmail.com> wrote: > Rasmus, thanks for that suggestion. I have seen this regular expression > before recently for the same purpose, but not an explanation for why it > matches only supplementary characters. Do you know, or have you read > somewhere, a good explanation for that? Hrm. To be honest I first tried to match characters in the range U +D800 - U+DFFF, but that didn't work for some reason. I then googled a bit and found the above regex. It worked so I just used it and didn't think about it much. It is a bit weird indeed. It gives you the impression that Java regexes is aware of supplementary characters somehow. Maybe it would also work to match characters not outside that range? Like this: #"[^\u0000-\uD7FF\uE000-\uFFFF]". That regex would be a bit more clear, I think. // Rasmus -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en