> Sorry for the delay in replying. Thanks, and I'm in turn sorry myself, too. Here is my experiment:
I only looked at the first test case, where a bigfile.txt has occasionally None lines. To run this test, I created a file with 33079296 and 704512 lines, by repeatedly cat'ing /etc/passwd. I then ran this program: import time,random l = [] t=time.time() for line in open("x"): x = random.randint(0,100) if x < 4: l.append(None) else: l.append(line) print(time.time()-t) t=time.time() for i in range(10): L = l[:] L.sort() print(time.time()-t) def key(n): return n or "" t=time.time() for i in range(10): L = l[:] L.sort(key=key) print(time.time()-t) This key function is based on the assumption that there won't be any strings in l, which is reasonable, because they are all lines read from the input file (i.e. contain \n at least). If truly empty strings where possible, I'd map them to chr(0), which I would assume couldn't occur elsewhere in the input. With 2.5, I get these results (rounded) 3.8s 7.6s 14.3s So with the key function, the overhead appears to have doubled. I think the ratio should decrease as lists grow larger (because the key function is linear, and the sort is nlogn, assuming the overhead is in computing the key function), but I haven'd done any series varying the file size. With 3.0(rc3), I get (omitting the key-less sorting, as it won't work) 24.8s 15.9s So now the IO is much larger than the actual sorting, which is about the same in 2.5 and 3.0. If you wonder where the overhead of IO comes from: part of it probably from the UTF-8 decoding, that the IO library now does transparently. I don't know whether this overhead qualifies as "not insignificant", most people probably would claim that a doubling in time is indeed significant. OTOH, not knowing what the original problem is (e.g. whether it's interactive or in a batch mode, and whether bigfile.txt is much smaller or much larger than 32MiB), I would probably a) evaluate whether the increase in IO time is acceptable, and b) if it is, just go with 3.0, as the increase in sorting time is much smaller than the increase in IO time. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list