The troubling thing isn't the use of Normalizer to remove accents, but the use of .toUpper, .toLower, and .equalsIgnoreCase instead of Normalizer, which may run into problems. For example you probably want "weiß" and "WEISS" to compare equal when ignoring case. For a case-insensitive comparison I tend to compare the outputs of this for two strings:
(defn normalize "Given a string, normalizes it so that it may be used as a key in a hashmap and compare equal to all strings representing the same word/spelling. There are edge cases that .toLowerCase or .toUpperCase would not handle, so the actual procedure uses java.text.Normalizer as well as both of the above." ; => (= (normalize "ß") (normalize "sS")) ; true ; => (= (normalize "é") (normalize "é")) ; true ; ; Note that the latter are two different és, if this file encoding preserved ; ; the difference. One uses a combining diacritic and one is integral. [^String s] (-> s (java.text.Normalizer/normalize (java.text.Normalizer$Form/NFKC)) (.toUpperCase) (.toLowerCase))) Of course for some uses you want to compare the results of stripping accents entirely, such as user text search (so a user input of "desole" will match "désolé", making it possible for people with en-US keyboards and operating systems to find it without jumping through hoops; of course this is most important with name searches, so e.g. one might search for Hervé Jean-Pierre Villechaize with "herve jean pierre villechaize" and not fail to discover his role in The Man with the Golden Gun). -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.