On 31/10/12 23:09, Steven D'Aprano wrote:
On Wed, 31 Oct 2012 15:17:14 +0000, djc wrote:
The best I can think of is to split the input sequence into two lists,
sort each and then join them.
According to your example code, you don't have to split the input because
you already have two lists, one filled with numbers and one filled with
strings.
Sorry for the confusion, the pair of strings was just a way of testing
variations on the input. So a sequence with any combination of strings
that can be read as numbers and strings of chars that don't look like
numbers (even if that string includes digits) is the expected input
But I think that what you actually have is a single list of strings, and
you are supposed to sort the strings such that they come in numeric order
first, then alphanumerical. E.g.:
['9', '1000', 'abc2', '55', '1', 'abc', '55a', '1a']
=> ['1', '1a', '9', '55', '55a', '1000', 'abc', 'abc2']
Not quite, what I want is to ensure that if the strings look like
numbers they are placed in numerical order. ie 1 2 3 10 100 not 1 10 100
2 3. Cases where a string has some leading digits can be treated as
strings like any other.
At least that is what I would expect as the useful thing to do when
sorting.
Well it depends on the use case. In my case the strings are column and
row labels for a report. I want them to be presented in a convenient to
read sequence. Which the lexical sorting of the strings that look like
numbers is not. I want a reasonable do-what-i-mean default sort order
that can handle whatever strings are used.
The trick is to take each string and split it into a leading number and a
trailing alphanumeric string. Either part may be "empty". Here's a pure
Python solution:
from sys import maxsize # use maxint in Python 2
def split(s):
for i, c in enumerate(s):
if not c.isdigit():
break
else: # aligned with the FOR, not the IF
return (int(s), '')
return (int(s[:i] or maxsize), s[i:])
Now sort using this as a key function:
py> L = ['9', '1000', 'abc2', '55', '1', 'abc', '55a', '1a']
py> sorted(L, key=split)
['1', '1a', '9', '55', '55a', '1000', 'abc', 'abc2']
The above solution is not quite general:
* it doesn't handle negative numbers or numbers with a decimal point;
* it doesn't handle the empty string in any meaningful way;
* in practice, you may or may not want to ignore leading whitespace,
or trailing whitespace after the number part;
* there's a subtle bug if a string contains a very large numeric prefix,
finding and fixing that is left as an exercise.
That looks more than general enough for my purposes! I will experiment
along those lines, thank you.
--
http://mail.python.org/mailman/listinfo/python-list