A Tuesday 20 November 2007, Istvan Albert escrigué: > On Nov 19, 2:33 pm, Francesc Altet <[EMAIL PROTECTED]> wrote: > > Just for the record. I was unable to stop thinking about this, and > > after some investigation, I guess that my rememberings were > > gathered from some benchmarks with code in Pyrex (i.e. pretty near > > to C speed). > > Pretty interesting and quite unexpected.
Yeah, and also bogus :( It turned out that in my previous benchmark, I've used plain numpy int arrays to do measurements, and when you extract an element out of a numpy array what you obtain is a *numpy scalar*, which is a different beast than a python (long) integer. Unfortunately, pyrex does swallow it and happily converted to a long long C, but produces a wrong result. This looks like a pyrex fault, and I should report it to the pyrex list. At any rate, with the corrected benchmarks using plain python lists instead (attached), the numbers are: Calibrating loop... Calibrating time: 1.458 Creating the dictionary... Time for dict creation: 15.305 Timing queries with a dict... Time for dict query: 11.632 Creating BinarySearch object... Time for BinarySearch creation: 8.648 Timing queries with a BinarySearch... Time for BinarySearch queries: 19.393 That is, dictionaries do lookups 1.7x faster than a binary lookup (1.42 us/lookup vs 2.37 us/lookup), which is, I suppose, what is expected. Well, this seems the end of my 'peculiar' theory, but at least served to uncover what it seems a bug in pyrex :P -- >0,0< Francesc Altet http://www.carabos.com/ V V Cárabos Coop. V. Enjoy Data "-"
# BinarySearch class to demonstrate that binary search is competitive # against dictionaries based on hashes import numpy # In order to access the data area in NumPy objects cdef extern from "numpy/arrayobject.h": ctypedef extern class numpy.ndarray [object PyArrayObject]: cdef char *data void import_array() import_array() cdef long bisect_left(long long *a, long long x, long lo, long hi): cdef long mid while lo < hi: mid = (lo+hi)/2 if a[mid] < x: lo = mid+1 else: hi = mid return lo cdef class BinarySearch: cdef long len cdef object values cdef long *revidd cdef long long *sidd cdef ndarray sid, revid def __new__(self, object keys, object values): self.sid = numpy.array(keys) self.revid = self.sid.argsort() self.revidd = <long *>self.revid.data self.sid.sort() self.sidd = <long long *>self.sid.data self.values = values self.len = len(values) - 1 def __getitem__(self, x): cdef long pos, idx pos = bisect_left(self.sidd, x, 0, self.len) idx = self.revidd[pos] return self.values[idx]
gq-binary-search.py
Description: application/python
setup.py
Description: application/python
-- http://mail.python.org/mailman/listinfo/python-list