On 18 March 2014 01:01, Daniel Stutzbach <stutzb...@google.com> wrote: > I would love to have include macro-benchmarks. I keep waiting for the PyPy > benchmark suite to get ported to Python 3...
*grins* >> "Delete a slice" is fudged from its inclusion of multiplication, which >> is far faster on blists. I admit that it's not obvious how to fix >> this. > > I could move the initialization into the timed part, similar to what I did > for sort (see below). That has downsides too, of course, but it might be an > improvement. You could try making a baseline and subtracting it: timer("del x[len(x)//4:3*len(x)//4]; x *= 2") - timer("x * 2") Not ideal, but closer, assuming that the multiplication isn't much larger than the deletion. Error would be summed. >> "Sort *" are really unfair because they put initialisation in the >> timed part > > That's a limitation of timeit. The setup step is only executed once. If I > put the initialization there, every sort after the first one would be > sorting a pre-sorted list. If you compare the "Create form an iterator" and > "Sort a random list", you'll see that the initialization cost is dwarfed by > the sorting cost for n > 15 or so. This argument is slightly less convincing without the overhead of the keys. It might be worth doing a subtraction and adding some error-bars as I suggest above. Nevertheless, I do agree for n > some small n, which is all that matters anyway. >> and all have keys. > > If you use classes with __lt__ methods instead of keys, the cost is > dominated by the calls to __lt__. You're right that I should include both, > though. This argument doesn't make sense to me. The only time this happens is when you have a non-primitive and your transformation gives a primitive which has optimised comparisons. This typically only happens when the key is a getitem or getattr, in which case it's just meaningless overhead. I see little reason to care about the key's cost in those cases. > That's definitely a cache issue, which is always a risk with > micro-benchmarks. > > I agree it's more interesting to pick items randomly instead of always > querying the same index. The overhead of choice() is kind of a problem, > though. Since I'm only plotting up to 10**5, I'd expect these to look more > or less flat. You could try jumping around to avoid the cache without using random numbers. Something like "idx = (idx + LARGE_PRIME) % n" might have less overhead. Further, the subtraction method would probably work fine for that. Also, I don't think the cache is all bad. Chances are a lot of list accesses have a lot of data locality. > Thanks for all of the feedback. Thanks in turn for the module :). -- https://mail.python.org/mailman/listinfo/python-list