A Wednesday 14 November 2007, Francesc Altet escrigué:
> I think I've got messed on some benchmarks that I've done on that
> subject some time ago, but either my memory is bad or I've made some
> mistake on those experiments.  My apologies.

Just for the record.  I was unable to stop thinking about this, and 
after some investigation, I guess that my rememberings were gathered 
from some benchmarks with code in Pyrex (i.e. pretty near to C speed).

In the attached benchmark, I compare the speed of dictionary lookups in 
a big dictionary (more than 8 millions of entries) against doing the 
same, but using a binary search approach.  Here are the results:

$ python2.5 gq-binary-search.py
Creating the dictionary...
Time for dict creation: 13.001
Calibrating loop...
Calibrating time: 3.095
Timing queries with a dict...
Time for dict query: 7.747
Timing queries with binary search...
Time for binary queries: 3.551

so, a dictionary lookup takes around 950 ns, while a binary search 
lookup takes around 380 ns, i.e. binary searches are more than 2x 
faster than dictionary lookups, at least in this benchmark.

It might well be that the difference is due to the fact that the binary 
lookup in this case is made for only 64-bit integers, while the 
dictionary code has to check the type of index data first.  OTOH, it is 
also apparent that the binary code can be optimized for cache misses by 
using two levels of indirection for binary lookups, but this is too 
much work for today ;).  At any rate, it seems rather clear that binary 
lookups can be competive against hashes, when using Pyrex at least!

Cheers,

-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"

Attachment: gq-binary-search.py
Description: application/python

Attachment: setup.py
Description: application/python

# BinarySearch class to demonstrate that binary search is competitive
# against dictionaries based on hashes

# In order to access the data area in NumPy objects
cdef extern from "numpy/arrayobject.h":

  ctypedef extern class numpy.ndarray [object PyArrayObject]:
    cdef char *data

  void import_array()

import_array()


cdef long bisect_left(long long *a, long long x, long hi):
    cdef long lo, mid

    lo = 0
    if x <= a[0]: return 0
    if a[-1] < x: return hi
    while lo < hi:
        mid = (lo+hi)/2
        if a[mid] < x: lo = mid+1
        else: hi = mid
    return lo


cdef class BinarySearch:
    cdef long len
    cdef ndarray str2
    cdef long *revidd
    cdef long long *sidd

    def __new__(self, ndarray sid, ndarray revid, ndarray str2):
        self.str2 = str2
        self.revidd = <long *>revid.data
        self.sidd = <long long *>sid.data
        self.len = len(str2) - 1

    def __getitem__(self, x):
        cdef long pos, idx

        pos = bisect_left(self.sidd, x, self.len)
        idx = self.revidd[pos]
        return self.str2[idx]
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to