I think the best way to do this is to have PyICU include the Transform
classes in their bindings, use those with the following transform:
"lower; latin; nfkd" and hand remove anything that isn't a legal
username character ([^-_a-zA-Z]).  This will remove accents and such
composing characters.

This will still need special handling for some characters, including the
example given of ø.  My testing and a IBM FAQ entry [1] indicate that
there are several special cases that normal Unicode transform doesn't do
right.  So we'll have to hand-transform some things, like ß and æ.
Basically anything listed in the IBM article.

BTW, you can play around with Unicode transforms online [2].  It's
pretty interesting.  For our purposes, using the 'Names' data is
particularly relevant.

Unfortunately, the PyICU bindings do *not* have the Transform bits of
ICU wrapped yet.

[1] http://ibm.com/support/docview.wss?uid=swg21247569
[2] http://demo.icu-project.org/icu-bin/translit

-- 
Install menu removes foreign characters from user name
https://bugs.launchpad.net/bugs/388028
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to