> > Hi all, > > > > > > I would like to sort according to this order: > > > > > > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', > > > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', 'C', > > > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', 'f', > > > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '?', > > > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '?', > > > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', 'R', > > > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', 'v', > > > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z') > > > > > > How can I do this? The default sorted() does not give the desired result. > > > > I'm assuming that doesn't correspond to some standard locale's collating > > order, so we really do need to roll our own encoding (and that you have > > a good reason for wanting to do this).
It is for creating a Dutch dictionary. This sorting order is not to be found in an existing locale. > I'm also assuming that what I'm > > seeing as question marks are really accented characters in some encoding > > that my news reader just isn't dealing with (it seems to think your post > > was in ISO-2022-CN (Simplified Chinese). > > > > I'm further assuming that you're starting with a list of unicode > > strings, the contents of which are limited to the above alphabet. Correct. > I'm > > even further assuming that the volume of data you need to sort is small > > enough that efficiency is not a huge concern. Well, it is for 200,000 - 450,000 words but the code is allowed be slow. It will not be used for web application or something which requires a quick response. > Given all that, I would start by writing some code which turned your > > alphabet into a pair of dicts. One maps from the code point to a > > collating sequence number (i.e. ordinals), the other maps back. > > Something like (for python 2.7): > > > > alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', > > '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?', > > [...] > > 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z') > > > > map1 = {c: n for n, c in enumerate(alphabet)} > > map2 = {n: c for n, c in enumerate(alphabet)} OK, similar to Thomas' proposal. > Next, I would write some functions which encode your strings as lists of > > ordinals (and back again) > > > > def encode(s): > > "encode('foo') ==> [34, 19, 19]" # made-up ordinals > > return [map1[c] for c in s] > > > > def decode(l): > > "decode([34, 19, 19]) ==> 'foo'" > > return ''.join(map2[i] for i in l) > > > > Use these to convert your strings to lists of ints which will sort as > > per your specified collating order, and then back again: > > > > encoded_strings = [encode(s) for s in original_list] > > encoded_strings.sort() > > sorted_strings = [decode(l) for l in encoded_strings] > > > > That's just a rough sketch, and completely untested, but it should get > > you headed in the right direction. Or at least one plausible direction. > > Old-time perl hackers will recognize this as the Schwartzian Transform. I will test it and let you know. :) Pander -- http://mail.python.org/mailman/listinfo/python-list