A Wednesday 14 November 2007, Francesc Altet escrigué: > I think I've got messed on some benchmarks that I've done on that > subject some time ago, but either my memory is bad or I've made some > mistake on those experiments. My apologies.
Just for the record. I was unable to stop thinking about this, and after some investigation, I guess that my rememberings were gathered from some benchmarks with code in Pyrex (i.e. pretty near to C speed). In the attached benchmark, I compare the speed of dictionary lookups in a big dictionary (more than 8 millions of entries) against doing the same, but using a binary search approach. Here are the results: $ python2.5 gq-binary-search.py Creating the dictionary... Time for dict creation: 13.001 Calibrating loop... Calibrating time: 3.095 Timing queries with a dict... Time for dict query: 7.747 Timing queries with binary search... Time for binary queries: 3.551 so, a dictionary lookup takes around 950 ns, while a binary search lookup takes around 380 ns, i.e. binary searches are more than 2x faster than dictionary lookups, at least in this benchmark. It might well be that the difference is due to the fact that the binary lookup in this case is made for only 64-bit integers, while the dictionary code has to check the type of index data first. OTOH, it is also apparent that the binary code can be optimized for cache misses by using two levels of indirection for binary lookups, but this is too much work for today ;). At any rate, it seems rather clear that binary lookups can be competive against hashes, when using Pyrex at least! Cheers, -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"
gq-binary-search.py
Description: application/python
setup.py
Description: application/python
# BinarySearch class to demonstrate that binary search is competitive # against dictionaries based on hashes # In order to access the data area in NumPy objects cdef extern from "numpy/arrayobject.h": ctypedef extern class numpy.ndarray [object PyArrayObject]: cdef char *data void import_array() import_array() cdef long bisect_left(long long *a, long long x, long hi): cdef long lo, mid lo = 0 if x <= a[0]: return 0 if a[-1] < x: return hi while lo < hi: mid = (lo+hi)/2 if a[mid] < x: lo = mid+1 else: hi = mid return lo cdef class BinarySearch: cdef long len cdef ndarray str2 cdef long *revidd cdef long long *sidd def __new__(self, ndarray sid, ndarray revid, ndarray str2): self.str2 = str2 self.revidd = <long *>revid.data self.sidd = <long long *>sid.data self.len = len(str2) - 1 def __getitem__(self, x): cdef long pos, idx pos = bisect_left(self.sidd, x, self.len) idx = self.revidd[pos] return self.str2[idx]
-- http://mail.python.org/mailman/listinfo/python-list