[issue11180] More efficient nlargest()/nsmallest()

2011-02-12 Thread Raymond Hettinger
Changes by Raymond Hettinger : Removed file: http://bugs.python.org/file20748/heapq_benchmark.py ___ Python tracker ___ ___ Python-bugs-list m

[issue11180] More efficient nlargest()/nsmallest()

2011-02-12 Thread Raymond Hettinger
Raymond Hettinger added the comment: Updated the benchmarking code to include a 4th variant that accumulates sorted sublists during the partitioning phase. Results from one run: n: 10 k: 100 [105856, 105917, 105951, 105977, 106366] nsmallest [166465, 166478, 166507, 166639, 166748]

[issue11180] More efficient nlargest()/nsmallest()

2011-02-11 Thread Raymond Hettinger
Raymond Hettinger added the comment: FWIW, I've attached benchmarking code for all three approaches (the current approach and the two proposals). -- Added file: http://bugs.python.org/file20748/heapq_benchmark.py ___ Python tracker

[issue11180] More efficient nlargest()/nsmallest()

2011-02-11 Thread Mark Dickinson
Mark Dickinson added the comment: [Raymond] > Also, I question your assertions about running time. [...] Interesting observation. I wonder what the average number of comparisons actually is, for random inputs. A quick back-of-the-envelope calculation suggests that O(max(k*log(k)*log(n), n))

[issue11180] More efficient nlargest()/nsmallest()

2011-02-11 Thread Raymond Hettinger
Raymond Hettinger added the comment: The current routines are designed to consume only k elements of memory. That is the way it should remain. Manifesting the entire input iterable into memory is unnecessary and is not cache friendly. Also, I question your assertions about running time. In

[issue11180] More efficient nlargest()/nsmallest()

2011-02-10 Thread Benjamin Peterson
Benjamin Peterson added the comment: I think you should implement and benchmark one of them. I'm dubious that the better asymptotic values will really translate into better performance. For example, the O(n) median selection algorithm has a large constant factor associated with it. -

[issue11180] More efficient nlargest()/nsmallest()

2011-02-10 Thread R. David Murray
Changes by R. David Murray : -- nosy: +rhettinger ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.p

[issue11180] More efficient nlargest()/nsmallest()

2011-02-10 Thread newacct
New submission from newacct : heapq.nlargest()/nsmallest() currently use a size-k heap to get the k largest/smallest items. This takes O(n log k): heapifying the first k elements (O(k)); iterating through (n-k) elements of the list, each insertion/deletion takes O(log k), for O((n-k)log k); an