The troubling thing isn't the use of Normalizer to remove accents, but the 
use of .toUpper, .toLower, and .equalsIgnoreCase instead of Normalizer, 
which may run into problems. For example you probably want "weiß" and 
"WEISS" to compare equal when ignoring case. For a case-insensitive 
comparison I tend to compare the outputs of this for two strings:

(defn normalize
  "Given a string, normalizes it so that it may be used as a key in a 
hashmap
   and compare equal to all strings representing the same word/spelling.
   There are edge cases that .toLowerCase or .toUpperCase would not handle,
   so the actual procedure uses java.text.Normalizer as well as both of the
   above."
; => (= (normalize "ß") (normalize  "sS"))
; true
; => (= (normalize  "é") (normalize  "é"))
; true
; ; Note that the latter are two different és, if this file encoding 
preserved
; ; the difference. One uses a combining diacritic and one is integral.
  [^String s]
  (-> s
    (java.text.Normalizer/normalize (java.text.Normalizer$Form/NFKC))
    (.toUpperCase)
    (.toLowerCase)))

Of course for some uses you want to compare the results of stripping 
accents entirely, such as user text search (so a user input of "desole" 
will match "désolé", making it possible for people with en-US keyboards and 
operating systems to find it without jumping through hoops; of course this 
is most important with name searches, so e.g. one might search for Hervé 
Jean-Pierre Villechaize with "herve jean pierre villechaize" and not fail 
to discover his role in The Man with the Golden Gun).

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to