Re: sort order for strings of digits

DJC Wed, 31 Oct 2012 16:53:12 -0700

On 31/10/12 23:09, Steven D'Aprano wrote:

On Wed, 31 Oct 2012 15:17:14 +0000, djc wrote:

The best I can think of is to split the input sequence into two lists,
sort each and then join them.


According to your example code, you don't have to split the input because
you already have two lists, one filled with numbers and one filled with
strings.

Sorry for the confusion, the pair of strings was just a way of testingvariations on the input. So a sequence with any combination of stringsthat can be read as numbers and strings of chars that don't look likenumbers (even if that string includes digits) is the expected input


But I think that what you actually have is a single list of strings, and
you are supposed to sort the strings such that they come in numeric order
first, then alphanumerical. E.g.:

['9', '1000', 'abc2', '55', '1', 'abc', '55a', '1a']
=> ['1', '1a', '9', '55', '55a', '1000', 'abc', 'abc2']

Not quite, what I want is to ensure that if the strings look likenumbers they are placed in numerical order. ie 1 2 3 10 100 not 1 10 1002 3. Cases where a string has some leading digits can be treated asstrings like any other.

At least that is what I would expect as the useful thing to do when
sorting.

Well it depends on the use case. In my case the strings are column androw labels for a report. I want them to be presented in a convenient toread sequence. Which the lexical sorting of the strings that look likenumbers is not. I want a reasonable do-what-i-mean default sort orderthat can handle whatever strings are used.


The trick is to take each string and split it into a leading number and a
trailing alphanumeric string. Either part may be "empty". Here's a pure
Python solution:

from sys import maxsize  # use maxint in Python 2
def split(s):
     for i, c in enumerate(s):
         if not c.isdigit():
             break
     else:  # aligned with the FOR, not the IF
         return (int(s), '')
     return (int(s[:i] or maxsize), s[i:])

Now sort using this as a key function:

py> L = ['9', '1000', 'abc2', '55', '1', 'abc', '55a', '1a']
py> sorted(L, key=split)
['1', '1a', '9', '55', '55a', '1000', 'abc', 'abc2']


The above solution is not quite general:

* it doesn't handle negative numbers or numbers with a decimal point;

* it doesn't handle the empty string in any meaningful way;

* in practice, you may or may not want to ignore leading whitespace,
   or trailing whitespace after the number part;

* there's a subtle bug if a string contains a very large numeric prefix,
   finding and fixing that is left as an exercise.

That looks more than general enough for my purposes! I will experimentalong those lines, thank you.



--
http://mail.python.org/mailman/listinfo/python-list

Re: sort order for strings of digits

Reply via email to