On 9/4/07, Tuomas <[EMAIL PROTECTED]> wrote: > Gabriel Genellina wrote: > > En Tue, 04 Sep 2007 07:34:54 -0300, Tuomas > > <[EMAIL PROTECTED]> escribi�: > > > >> Python 2.4.3 (#3, Jun 4 2006, 09:19:30) > >> [GCC 4.0.0 20050519 (Red Hat 4.0.0-8)] on linux2 > >> Type "help", "copyright", "credits" or "license" for more information. > >> >>> import locale > >> >>> def key(s): > >> ... locale.setlocale(locale.LC_COLLATE, 'en_US.utf8') > >> ... return locale.strxfrm(s.encode('utf8')) > >> ... > >> >>> first=key(u'maupassant guy') > >> >>> first==key(u'maupassant guy') > >> False > >> >>> first > >> '\x18\x0c \x1b\x0c\x1e\x1e\x0c\x19\x1f\x12 > >> $\x01\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x01\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x01\xf5\xb79' > >> > >> >>> key(u'maupassant guy') > >> '\x18\x0c \x1b\x0c\x1e\x1e\x0c\x19\x1f\x12 > >> $\x01\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x01\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x02\x01\xb5' > >> > >> >>> > >> > >> May be this is enough for a sort order but I need to be able to catch > >> equals too. Any hints/explanations? > > > > > > I can't use your same locale, but with my own locale settings, I get > > consistent results: > > > > Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit > > (Intel)] on > > win32 > > Type "help", "copyright", "credits" or "license" for more information. > > py> import locale > > py> locale.setlocale(locale.LC_COLLATE, 'Spanish_Argentina') > > 'Spanish_Argentina.1252' > > py> def key(s): > > ... return locale.strxfrm(s.encode('utf8')) > > ... > > Because I am writing a multi language application I need to plase the > locale setting inside the key function. Actually I am implementing > binary search in a locally sorted list of strings and should be able to > count on stable results of strxfrm despite possibly visiting another > locale at meantime. Could repeated calls to setlocale cause some problems? > > > py> first=key(u'maupassant guy') > > py> print repr(first) > > '\x0eQ\x0e\x02\x0e\x9f\x0e~\x0e\x02\x0e\x91\x0e\x91\x0e\x02\x0ep\x0e\x99\x07\x02 > > > > \x0e%\x0e\x9f\x0e\xa7\x01\x01\x01\x01' > > py> print repr(key(u'maupassant guy')) > > '\x0eQ\x0e\x02\x0e\x9f\x0e~\x0e\x02\x0e\x91\x0e\x91\x0e\x02\x0ep\x0e\x99\x07\x02 > > > > \x0e%\x0e\x9f\x0e\xa7\x01\x01\x01\x01' > > py> print first==key(u'maupassant guy') > > True > > > > Same thing with Python 2.4.4 > > > > I get the same unstability with my locale 'fi_FI.utf8' too, so I am > wondering if the source of the problem is the clib or the Python wrapper > around it.
Looking at the python source, the only possible error case I can see is that the wrapper assumes the string returned by strxfrm will be null terminated. It's not 100% clear from the documentation I have that the string is guaranteed to be null terminated, although it's implied, so this is a remotely possible case. You might try calling the clib directly. -- http://mail.python.org/mailman/listinfo/python-list