On Sep 2, 7:59 am, Peter Otten <__pete...@web.de> wrote: > Dmitry Chichkov wrote: > > Given: a large list (10,000,000) of floating point numbers; > > Task: fastest python code that finds k (small, e.g. 10) smallest > > items, preferably with item indexes; > > Limitations: in python, using only standard libraries (numpy & scipy > > is Ok); > > > I've tried several methods. With N = 10,000,000, K = 10 The fastest so > > far (without item indexes) was pure python implementation > > nsmallest_slott_bisect (using bisect/insert). And with indexes > > nargsmallest_numpy_argmin (argmin() in the numpy array k times). > > > Anyone up to the challenge beating my code with some clever selection > > algorithm? > > If you don't care about stability, i. e. whether nlargest(2, [1, 2, 2.0]) > returns [1, 2] or [1, 2.0], use > > _heapq.nlargest() directly > > $ python nsmallest_perf2.py > nsmallest --> 0.142784833908 > nsmallest_slott_bisect --> 0.19291305542 > $ cat nsmallest_perf2.py > from random import randint, random > import time > from bisect import insort > from itertools import islice > import _heapq > > _funcs = [] > def timeit(f): > _funcs.append(f) > > def time_all(*args): > funcs = _funcs > width = max(len(f.__name__) for f in funcs) > prev = None > for f in funcs: > start = time.time() > result = f(*args) > end = time.time() > print "%-*s --> %10s" % (width, f.__name__, end - start) > if prev is None: > prev = result > else: > assert prev == result > > timeit(_heapq.nsmallest) > > @timeit > def nsmallest_slott_bisect(n, iterable, insort=insort): > it = iter(iterable) > mins = sorted(islice(it, n)) > for el in it: > if el <= mins[-1]: #NOTE: equal sign is to preserve duplicates > insort(mins, el) > mins.pop() > return mins > > test_data = [randint(10, 50) + random() for i in range(10**6)] > K = 10 > > time_all(K, test_data) > > Peter
I get: 1.46s for _heapq.nsmallest 0.85s for nsmallest_slott_bisect2 (version I posted) I am a bit surprised that mine is so slow compared with yours. I'll do more tests later! -- Arnaud -- http://mail.python.org/mailman/listinfo/python-list