Jeethu Rao <jee...@jeethurao.com> added the comment: Victor: I’m booting with the isolcpus and rcu_nocbs flags, and running pyperformance with the --affinity flag to pin the benchmark to the isolated CPU cores. I’ve also run `perf system tune`. And the OS is Ubuntu 17.10. Thanks for the tip on using perf timeit instead of timeit. I’ve run the benchmark that you've suggested with a minor change (to avoid the cost of LOAD_ATTR) and attached the output on a gist[1].
Antoine: Thanks for benchmarking it. After looking at the generated assembly[2], I found that ins1 is being inlined and the call to memmove was appearing before the loop (possibly because the compiler assumes that the call to memmove is more likely). I made a minor change and increased the threshold to 32. I’ve attached the generated assembly in a gist[3] (The relevant sequence is around line 8406, if you’re interested). And here’s the pyperformance comparison[4]. Could you please try benchmarking this version on your machine? [1]: https://gist.github.com/jeethu/2d2de55afdb8ea4ad03b6a5d04d5227f [2]: Generated with “gcc -DNDEBUG -fwrapv -O3 -std=c99 -I. -I./Include -DPy_BUILD_CORE -S -masm=intel Objects/listobject.c” [3]: https://gist.github.com/jeethu/596bfc1251590bc51cc230046b52fb38 [4]: https://gist.github.com/jeethu/d6e4045f7932136548a806380eddd030 ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue32534> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com