On 17 March 2014 21:16, Daniel Stutzbach <stutzb...@google.com> wrote: > On Fri, Mar 14, 2014 at 6:13 PM, Joshua Landau <jos...@landau.ws> wrote: >> >> Now, I understand there are downsides to blist. Particularly, I've >> looked through the "benchmarks" and they seem untruthful. > > I worked hard to make those benchmarks as fair as possible. I recognize > that evaluating your own work always runs the risk of introducing hidden > biases, and I welcome input on how they could be improved.
Thanks. First, I want to state that there are two aspects to my claim. The first is that these benchmarks to not represent typical use-cases. I will not go too far into this, though, because it's mostly obvious. The second is that of the the flaws in the benchmarks themselves. I'll go through in turn some that are apparent to me: "Create from an iterator" gives me relatively different results when I run it (Python 3). "Delete a slice" is fudged from its inclusion of multiplication, which is far faster on blists. I admit that it's not obvious how to fix this. "First in, first out (FIFO)" should be "x.append(0); x.pop(0)". "Last in, first out (LIFO)" should use "pop()" over "pop(-1)", although I admit it shouldn't make a meaningful difference. "Sort *" are really unfair because they put initialisation in the timed part and all have keys. The benchmarks on Github are less bad, but the website really should include all of them and fix the remaining problems. I do understand that TimSort isn't the most suited algorithm, though, so I won't read too far into these results. Further, some of these tests don't show growth where they should, such as in getitem. The growth is readily apparent when measured as such: >>> python -m timeit -s "from random import choice; import blist; lst = >>> blist.blist(range(10**0))" "choice(lst)" 1000000 loops, best of 3: 1.18 usec per loop >>> python -m timeit -s "from random import choice; import blist; lst = >>> blist.blist(range(10**8))" "choice(lst)" 1000000 loops, best of 3: 1.56 usec per loop Lower size ranges are hidden by the function-call overhead. Perhaps this effect is to do with caching, in which case the limits of the cache should be explained more readily. Nevertheless, my enthusiasm for blist as an alternative stdlib implementation remains. There are obvious and large advantages to be had, sometimes when you wouldn't even expect. The slower aspects of blist are also rarely part of the bottlenecks of programs. So yeah, go for it. -- https://mail.python.org/mailman/listinfo/python-list