[issue21592] Make statistics.median run in linear time

Thomas Dybdahl Ahle Wed, 28 May 2014 05:14:29 -0700

New submission from Thomas Dybdahl Ahle:

The statistics module currently contains the following comment:


"FIXME: investigate ways to calculate medians without sorting? Quickselect?"

This is important, because users expect standard library functions to use state 
of the art implementations, and median by sorting has never been that.

There are many good linear time alternatives, the classical median-of-k 
algorithm was posted to the mailing list in a nice version by David Eppstein in 
2002 [1]. The fastest method in practice is probably Quick Select in the 
Floyd-Rivest version [2] which is similar to quick sort.

These algorithms also have the feature of letting us select any k-th order 
statistic, not just the median. This seems conceptually a lot simpler than the 
current median/median_low/median_high split.

However, sticking with the current api, a new implementation would have to 
support calculating a median as the average of n//2 and (n+1)//2'th order 
statistics. This could be implemented either by calling the select function 
twice, or by implementing a multi-select algorithm, which is also a well 
studied problem [3].

I'll be happy to contribute code, or help out in any other way.

[1]: https://mail.python.org/pipermail/python-list/2002-May/132170.html
[2]: https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm
[3]: 
https://domino.mpi-inf.mpg.de/intranet/ag1/ag1publ.nsf/0/59C74289A2291143C12571C30017DEA8/$file/Mehlhorn_a_2005_o.pdf

----------
components: Library (Lib)
messages: 219265
nosy: Thomas.Dybdahl.Ahle
priority: normal
severity: normal
status: open
title: Make statistics.median run in linear time
type: performance
versions: Python 3.4, Python 3.5

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21592>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue21592] Make statistics.median run in linear time

Reply via email to