I think the best way to do this is to have PyICU include the Transform classes in their bindings, use those with the following transform: "lower; latin; nfkd" and hand remove anything that isn't a legal username character ([^-_a-zA-Z]). This will remove accents and such composing characters.
This will still need special handling for some characters, including the example given of ø. My testing and a IBM FAQ entry [1] indicate that there are several special cases that normal Unicode transform doesn't do right. So we'll have to hand-transform some things, like ß and æ. Basically anything listed in the IBM article. BTW, you can play around with Unicode transforms online [2]. It's pretty interesting. For our purposes, using the 'Names' data is particularly relevant. Unfortunately, the PyICU bindings do *not* have the Transform bits of ICU wrapped yet. [1] http://ibm.com/support/docview.wss?uid=swg21247569 [2] http://demo.icu-project.org/icu-bin/translit -- Install menu removes foreign characters from user name https://bugs.launchpad.net/bugs/388028 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs