New submission from Thomas Dybdahl Ahle: The statistics module currently contains the following comment:
"FIXME: investigate ways to calculate medians without sorting? Quickselect?" This is important, because users expect standard library functions to use state of the art implementations, and median by sorting has never been that. There are many good linear time alternatives, the classical median-of-k algorithm was posted to the mailing list in a nice version by David Eppstein in 2002 [1]. The fastest method in practice is probably Quick Select in the Floyd-Rivest version [2] which is similar to quick sort. These algorithms also have the feature of letting us select any k-th order statistic, not just the median. This seems conceptually a lot simpler than the current median/median_low/median_high split. However, sticking with the current api, a new implementation would have to support calculating a median as the average of n//2 and (n+1)//2'th order statistics. This could be implemented either by calling the select function twice, or by implementing a multi-select algorithm, which is also a well studied problem [3]. I'll be happy to contribute code, or help out in any other way. [1]: https://mail.python.org/pipermail/python-list/2002-May/132170.html [2]: https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm [3]: https://domino.mpi-inf.mpg.de/intranet/ag1/ag1publ.nsf/0/59C74289A2291143C12571C30017DEA8/$file/Mehlhorn_a_2005_o.pdf ---------- components: Library (Lib) messages: 219265 nosy: Thomas.Dybdahl.Ahle priority: normal severity: normal status: open title: Make statistics.median run in linear time type: performance versions: Python 3.4, Python 3.5 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue21592> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com