[issue14757] INCA: Inline Caching meets Quickening in Python 3.3

stefan brunthaler Tue, 08 May 2012 13:59:00 -0700

stefan brunthaler <s.bruntha...@uci.edu> added the comment:

> This looks quite impressive, so sorry for immediately jumping in with
> criticism. -- I've benchmarked the things I worked on, and I can't see
> any speedups but some significant slowdowns. This is on 64-bit Linux
> with a Core 2 Duo, both versions compiled with just `./configure && make`:


Well, no problem -- I don't actually consider it criticism at all.
Build is correct, you could verify the interpreter working adequatly
by running the test suite and seeing some tests depending on specific
bytecodes fail (test_dis, and test_importlib, AFAIR).

I don't have a Core 2 Duo available for testing, though.

> Modules/_decimal/tests/bench.py:
> --------------------------------
>
> Not much change for floats and decimal.py, 8-10% slowdown for _decimal!

This result is not unexpected, as I have no inline cached versions of
functions using this module. The derivatives I generate work for Long,
Float and Complex numbers (plus Unicode strings and some others.) If
there is a clear need, of course I can look into that and add these
derivatives (as I said, there are still some 40+ opcodes unused.)

> Memoryview:
> -----------
>
> ./python -m timeit -n 10000000 -s "x = memoryview(bytearray(b'x'*10000))" 
> "x[:100]"
>
> 17% (!) slowdown.

Hm, the 17% slowdown seems strange to me. However, I don't expect to
see any speedups in this case, as there is no repeated execution
within the benchmark code that could leverage type feedback via inline
caching.

You should see most speedups when dealing with for-loops (as FOR_ITER
has optimized derivatives), if-statements (COMPARE_OP has optimized
derivatives), and mathematical code. In addition there are some
optimizations for frequently executed function calls, unpacked
sequences, etc. Note: frequent as in how I encountered them, probably
this needs adjustments for different use cases.

> Did I perhaps miss some option to turn on the optimizations?

Does not seem to be the case, but if you could verify running the
regression tests we could easily eliminate this scenario. You could
verifiy speedups, too, on computer language benchmark game benchmarks,
primarily binarytrees, mandelbrot, nbody and spectralnorm, just to see
how much you *should* gain on your machine. Testing methodology could
also make a difference. I use the following:
- Linux 3.0.0-17 (Ubuntu)
- gcc version 4.6.1
- nice -n -20 to minimize scheduler interference
- 30 repetitions per benchmark

I hope that helps/explains,
regards,
--stefan

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue14757>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14757] INCA: Inline Caching meets Quickening in Python 3.3

Reply via email to