Am Sat, 01 Sep 2007 18:56:38 -0300 schrieb Ricardo Aráoz: > Hi, I've been working on sorting out some words. > > My locale is : >>>> import locale >>>> locale.getdefaultlocale() > ('es_AR', 'cp1252') > > I do : >>>> a = 'áéíóúäëïöüàèìòù' >>>> print ''.join(sorted(a, cmp=lambda x,y: locale.strcoll(x,y))) > aeiouàáäèéëìíïòóöùúü
The lambda is superfluous. Just write cmp=locale.strcoll instead. > This is not what I am expecting. I was expecting : > aáàäeéèëiíìï.....etc. > > The reason is that if you want to order some words (say for a dictionary > (paper dict, where you look up words)) this is what happens : >>>> a = 'palàbra de pàlabra de pblabra' >>>> print ' '.join(sorted(a.split(), cmp=lambda x,y: locale.strcoll(x, y))) > de de palàbra pblabra pàlabra > > While any human being would expect : > > de de palàbra pàlabra pblabra > > Does anybody know a way in which I could get the desired output? I suppose it would work on your machine if you set the locale first with >>> locale.setlocale(locale.LC_ALL, "") 'de_DE.UTF-8' I have to resort to a list instead of a string on mine because it uses the UTF-8 encoding where one character may consist of more than one byte. (Providing key is more efficient than cmp.) >>> a = ['á', 'é', 'í', 'ó', 'ú', 'ä', 'ë', 'ï', 'ö', 'ü', 'à', 'è', 'ì', 'ò', >>> 'ù', 'a', 'e', 'i', 'o', 'u'] >>> print "".join(sorted(a, key=locale.strxfrm)) aáàäeéèëiíìïoóòöuúùü However, to make your program a bit more portable I recommend that you use unicode instead of str: >>> import locale >>> locale.setlocale(locale.LC_ALL, "") 'de_DE.UTF-8' >>> encoding = locale.getlocale()[1] >>> def sortkey(s): ... return locale.strxfrm(s.encode(encoding)) ... >>> print "".join(sorted(u"áéíóúäëïöüàèìòùaeiou", key=sortkey)) aáàäeéèëiíìïoóòöuúùü >>> Peter -- http://mail.python.org/mailman/listinfo/python-list