New submission from Daniel Stutzbach <dan...@stutzbachenterprises.com>:

(I've made an educated guess about who to add to the Nosy list)

The attached patch substantially speeds up sorting using the "key" parameter.  
It is purely a performance patch; the language and libraries are not changed in 
any other way from the users point of view.  I measured a reduction in 
execution time of at least 15% in many cases and more than 40% for large n.

I performed measurements on an Intel 64-bit Linux system using gcc 4.3.2 and on 
an Intel 32-bit Windows XP system using Visual C Express Edition 2009.

Previously, "key" was implemented by a creating a sortwrapperobject, which is a 
PyObject storing a key and a value and using only the key for comparison.  

With the patch, sortwrapperobject is removed entirely.  Instead, the sort uses 
two arrays: one for keys and one for values.  Comparisons use the keys array.  
Whenever a swap is performed, the swap is performed on both arrays.  If the 
"keys" parameter is not provided to the sort, the second swap is skipped (since 
the keys are also the values).

Compared to the sortwrapperobject approach, speed is improved by:
- Requiring only 1 memory allocation for an array of PyObject *'s, instead of n 
memory allocations for n sortwrapperobjects
- Removes the need to dereference through sortwrapperobjects, which were 
scattered about in memory (i.e., had poor cache locality)
- Substantially smaller memory overhead, further improving cache performance
 
When the "key" parameter is not used, the code still needs to check to see if 
it should be performing a second swap.  However, the additional instructions 
are cache and branch-prediction friendly and do not have a noticeable impact on 
performance.  I conducted enough experiments to establish a 95% confidence 
interval that excluded a slowdown of more than 1% (but did not exclude a 
slowdown of 0% - which is good).

A handful of results:

# No key, same speed
otto:~$ py3k-git-base/python -m timeit -s 'x = list(range(5))' -s 'f = x.sort' 
'f()'
1000000 loops, best of 3: 0.276 usec per loop
otto:~$ py3k-git/python -m timeit -s 'x = list(range(5))' -s 'f = x.sort' 'f()' 
1000000 loops, best of 3: 0.276 usec per loop

# With a key, patched version is faster
otto:~$ py3k-git-base/python -m timeit -s 'x = list(range(5))' -s 'f = x.sort' 
'f(key=int)'
1000000 loops, best of 3: 1.76 usec per loop
otto:~$ py3k-git/python -m timeit -s 'x = list(range(5))' -s 'f = x.sort' 
'f(key=int)'
1000000 loops, best of 3: 1.5 usec per loop

# Results are more dramatic with large n
otto:~$ py3k-git-base/python -m timeit -s 'x = list(range(100000))' -s 'f = 
x.sort' 'f(key=int)'
10 loops, best of 3: 35.2 msec per loop
otto:~$ py3k-git/python -m timeit -s 'x = list(range(100000))' -s 'f = x.sort' 
'f(key=int)'
10 loops, best of 3: 22.4 msec per loop

I have been using a script for running a large battery of experiments with 
different values of n and different conditions (random data, sorted data, 
reverse-sorted data, key, no key, etc.).  The script works, but it's clunky to 
use.  I'm working on cleaning that up and hope to attach it to this issue 
within the next few days.

----------
assignee: stutzbach
components: Library (Lib)
files: sort-key-locality.diff
keywords: patch
messages: 117097
nosy: collinwinter, rhettinger, stutzbach, tim_one
priority: normal
severity: normal
stage: patch review
status: open
title: speeding up sorting with a key
type: performance
versions: Python 3.2
Added file: http://bugs.python.org/file18952/sort-key-locality.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9915>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to