Dmitry Chichkov wrote: > Given: a large list (10,000,000) of floating point numbers; > Task: fastest python code that finds k (small, e.g. 10) smallest > items, preferably with item indexes; > Limitations: in python, using only standard libraries (numpy & scipy > is Ok); > > I've tried several methods. With N = 10,000,000, K = 10 The fastest so > far (without item indexes) was pure python implementation > nsmallest_slott_bisect (using bisect/insert). And with indexes > nargsmallest_numpy_argmin (argmin() in the numpy array k times). > > Anyone up to the challenge beating my code with some clever selection > algorithm?
If you don't care about stability, i. e. whether nlargest(2, [1, 2, 2.0]) returns [1, 2] or [1, 2.0], use _heapq.nlargest() directly $ python nsmallest_perf2.py nsmallest --> 0.142784833908 nsmallest_slott_bisect --> 0.19291305542 $ cat nsmallest_perf2.py from random import randint, random import time from bisect import insort from itertools import islice import _heapq _funcs = [] def timeit(f): _funcs.append(f) def time_all(*args): funcs = _funcs width = max(len(f.__name__) for f in funcs) prev = None for f in funcs: start = time.time() result = f(*args) end = time.time() print "%-*s --> %10s" % (width, f.__name__, end - start) if prev is None: prev = result else: assert prev == result timeit(_heapq.nsmallest) @timeit def nsmallest_slott_bisect(n, iterable, insort=insort): it = iter(iterable) mins = sorted(islice(it, n)) for el in it: if el <= mins[-1]: #NOTE: equal sign is to preserve duplicates insort(mins, el) mins.pop() return mins test_data = [randint(10, 50) + random() for i in range(10**6)] K = 10 time_all(K, test_data) Peter -- http://mail.python.org/mailman/listinfo/python-list