On Tue, Aug 19, 2014 at 12:37 PM, Chiu Hsiang Hsu <wdv47...@gmail.com> wrote: > On Tuesday, August 19, 2014 5:42:27 AM UTC+8, Dan Stromberg wrote: >> On Mon, Aug 18, 2014 at 10:18 AM, Chiu Hsiang Hsu <wdv47...@gmail.com> wrote: >> >> > I know that Python use Timsort as default sorting algorithm and it is >> > efficient, >> >> > but I just wanna have a partial sorting (n-largest/smallest elements). >> >> >> >> Perhaps heapq with Pypy? Or with nuitka? Or with numba?
> Another problem with heapq is the memory usage, it cost a lot of more memory > with heapq in CPython (I test it in 3.4 with 1000000 float numbers) compare > to sorted. This surprises me. I believe heapq probably keeps values in a python list with no extra references, by making node i's left child and right child be array elements 2*i and 2*i+1, respectively. A heap of some sort probably is best algorithmically. You're probably just up against a high constant. On the other hand, there are many kinds of heaps. > For curiosity, there are many speed up solution in Python (like Cython, > PyPy), I hasn't use Cython before, > I guess PyPy is a more convient way to speed up current Python code (?), > so how does Cython compare to PyPy ? (speed, code, flexibility, or anything > else) PyPy is really fast for CPU-intensive workloads, but CPython is better for I/O. I tested a single CPU-intensive microbenchmark of Cython and PyPy (also Jython and CPython). PyPy was fastest (http://stromberg.dnsalias.org/~strombrg/backshift/documentation/performance/index.html). I haven't yet compared numba or nuitka or Shedskin. When you use heapq, are you putting all the values in the heap, or just up to n at a time (evicting the worst value, one at a time as you go)? If you're doing the former, it's basically a heapsort which probably won't beat timsort. If you're doing the latter, that should be pretty good. -- https://mail.python.org/mailman/listinfo/python-list