[issue32534] Speed-up list.insert: use memmove()

2019-04-08 Thread Inada Naoki
Change by Inada Naoki : -- resolution: -> wont fix stage: patch review -> resolved status: open -> closed ___ Python tracker ___ __

[issue32534] Speed-up list.insert: use memmove()

2018-05-18 Thread STINNER Victor
STINNER Victor added the comment: This issue is a micro-optimization which is only 1.08x faster: https://bugs.python.org/issue32534#msg310146 Moreover, it seems really hard to measure precisely the benefit on benchmarks. Results seem to not be reliable. I suggest to close the issue as WONTFIX

[issue32534] Speed-up list.insert: use memmove()

2018-05-15 Thread Stéphane Wirtel
Stéphane Wirtel added the comment: Hi, just a small reminder for this issue because I was reviewing the PR. what is the status? Thanks -- nosy: +matrixise ___ Python tracker ___

[issue32534] Speed-up list.insert: use memmove()

2018-01-18 Thread Jesse Bakker
Change by Jesse Bakker : -- nosy: +Jesse Bakker ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.p

[issue32534] Speed-up list.insert: use memmove()

2018-01-17 Thread Jeethu Rao
Jeethu Rao added the comment: It's also interesting that in https://gist.github.com/pitrou/29eb7592fa1eae2be390f3bfa3db0a3a : | django_template | 307 ms| 312 ms | 1.02x slower | Not significant| It seems to be slower and the benchmarks before it (2to3, c

[issue32534] Speed-up list.insert: use memmove()

2018-01-17 Thread STINNER Victor
STINNER Victor added the comment: In https://gist.github.com/jeethu/dc0811d415dd6d1a1621761e43842f88 I read: | django_template | 160 ms | 129 ms | 1.24x faster | Significant (t=15.05) | So on this benchmark, the optimization seems significant. But in the same paste, I see other benchmarks 1.2

[issue32534] Speed-up list.insert: use memmove()

2018-01-17 Thread Jeethu Rao
Jeethu Rao added the comment: > What is 54640? That's the pid of the process. > I'm interested to know which benchmarks call list.insert() 40k times. The django_template benchmark. -- ___ Python tracker __

[issue32534] Speed-up list.insert: use memmove()

2018-01-17 Thread STINNER Victor
STINNER Victor added the comment: > Sure! https://gist.github.com/jeethu/000a2d3ecd9033c0ef51331f062ac294 I don't understand how to read this output. "54640 ins1 called 40050 times" What is 54640? I'm interested to know which benchmarks call list.insert() 40k times. --

[issue32534] Speed-up list.insert: use memmove()

2018-01-17 Thread Jeethu Rao
Jeethu Rao added the comment: > > I still think those numbers are misleading or downright bogus. There is no > > existing proof that list.insert() is a critical path in those benchmarks. > Can someone check if these bencmarks really use list.insert() in hot code? If > yes, why? :-) The cost

[issue32534] Speed-up list.insert: use memmove()

2018-01-17 Thread STINNER Victor
STINNER Victor added the comment: > I still think those numbers are misleading or downright bogus. There is no > existing proof that list.insert() is a critical path in those benchmarks. Can someone check if these bencmarks really use list.insert() in hot code? If yes, why? :-) The cost of l

[issue32534] Speed-up list.insert: use memmove()

2018-01-17 Thread Antoine Pitrou
Antoine Pitrou added the comment: Le 17/01/2018 à 14:36, Jeethu Rao a écrit : > > On the other hand, on the pyperformance comparison I'd posted yesterday[1], > there seems to be an average improvement of 1.27x on the first seven > benchmarks, and the slowest slowdown is only 1.03x. I still

[issue32534] Speed-up list.insert: use memmove()

2018-01-17 Thread Jeethu Rao
Jeethu Rao added the comment: > FWIW, we've encountered a number of situations in the past when something > that improved the timings on one compiler would make timings worse on another > compiler. There was also variance between timings on 32-bit builds versus > 64-bit builds. I've verifie

[issue32534] Speed-up list.insert: use memmove()

2018-01-17 Thread STINNER Victor
STINNER Victor added the comment: https://gist.github.com/jeethu/19430d802aa08e28d1cb5eb20a47a470 Mean +- std dev: 10.5 us +- 1.4 us => Mean +- std dev: 9.68 us +- 0.89 us It's 1.08x faster (-7.8%). It's small for a microbenchmark, usually an optimization should make a *microbenchmark* at lea

[issue32534] Speed-up list.insert: use memmove()

2018-01-16 Thread Raymond Hettinger
Raymond Hettinger added the comment: FWIW, we've encountered a number of situations in the past when something that improved the timings on one compiler would make timings worse on another compiler. There was also variance between timings on 32-bit builds versus 64-bit builds. -- _

[issue32534] Speed-up list.insert: use memmove()

2018-01-16 Thread Jeethu Rao
Jeethu Rao added the comment: > Be careful. Moving "l.insert" lookup of the loop might make the code slower. > I never looked why. But Python 3.7 was also optimized in many places to call > methods, so I'm not sure anymore :) Thanks again! Here's a gist without the hack[1]. [1]: https://gist

[issue32534] Speed-up list.insert: use memmove()

2018-01-16 Thread STINNER Victor
STINNER Victor added the comment: > I’ve run the benchmark that you've suggested with a minor change (to avoid > the cost of LOAD_ATTR) Be careful. Moving "l.insert" lookup of the loop might make the code slower. I never looked why. But Python 3.7 was also optimized in many places to call me

[issue32534] Speed-up list.insert: use memmove()

2018-01-16 Thread Jeethu Rao
Jeethu Rao added the comment: Victor: I’m booting with the isolcpus and rcu_nocbs flags, and running pyperformance with the --affinity flag to pin the benchmark to the isolated CPU cores. I’ve also run `perf system tune`. And the OS is Ubuntu 17.10. Thanks for the tip on using perf timeit ins

[issue32534] Speed-up list.insert: use memmove()

2018-01-16 Thread Antoine Pitrou
Antoine Pitrou added the comment: Ok, I ran the benchmarks here (Ubuntu 16.04, Core i5-2500K, PGO and LTO disabled) and I don't get any consistent speedup, which is more in line with what I was expecting: https://gist.github.com/pitrou/29eb7592fa1eae2be390f3bfa3db0a3a --

[issue32534] Speed-up list.insert: use memmove()

2018-01-16 Thread STINNER Victor
Change by STINNER Victor : -- title: Speed-up list.insert -> Speed-up list.insert: use memmove() ___ Python tracker ___ ___ Python-bu

[issue32534] Speed-up list.insert

2018-01-16 Thread STINNER Victor
STINNER Victor added the comment: > jeethu@dev:cpython (3.7_list_insert_memmove)$ ./python -m timeit -s "l = []" > "for _ in range(100): l.insert(0, None)" Please don't use timeit, but perf timeit to run such *microbenchmark* (time smaller than 1 ms). Your benchmark measures also the perfor

[issue32534] Speed-up list.insert

2018-01-16 Thread Antoine Pitrou
Antoine Pitrou added the comment: Thanks. That's really surprising. I'll give it a try myself. -- ___ Python tracker ___ ___ Pytho

[issue32534] Speed-up list.insert

2018-01-16 Thread Jeethu Rao
Jeethu Rao added the comment: Built and benchmarked both the baseline and the patch without PGO; the differences are less pronounced, but still present. https://gist.github.com/jeethu/abd404e39c6dfcbabb4c01661b9238d1 -- ___ Python tracker

[issue32534] Speed-up list.insert

2018-01-15 Thread Antoine Pitrou
Antoine Pitrou added the comment: Perhaps the patch is interfering weirdly with PGO? > Should I run the benchmark without a PGO build (i.e without > --enable-optimizations)? That would help clear things up, IMHO. -- ___ Python tracker

[issue32534] Speed-up list.insert

2018-01-15 Thread Jeethu Rao
Jeethu Rao added the comment: I rebased my branch off of master and rebuilt it, and also rebuilt the baseline from master. Both versions were configured with --with-lto and --enable-optimizations. The benchmark numbers are rather different this time[1]. pidigits is slower, but nbody is still

[issue32534] Speed-up list.insert

2018-01-15 Thread Antoine Pitrou
Antoine Pitrou added the comment: I'm quite surprised so many benchmarks would be speeded up so significantly by a list.insert() optimization (why a 27% speedup on computing digits of pi, or a 33% speedup on a N-body simulation?). Are you sure the two executables are similarly compiled? ---

[issue32534] Speed-up list.insert

2018-01-14 Thread Raymond Hettinger
Raymond Hettinger added the comment: The result likely varies quite a bit from compiler-to-compiler and processor-to-processor and os-to-os. It is may also be affected by data size and caching. -- nosy: +rhettinger ___ Python tracker

[issue32534] Speed-up list.insert

2018-01-14 Thread Jeethu Rao
Jeethu Rao added the comment: I managed to tune an i7700k desktop running Ubuntu 17.10 per this doc[1], and ran the pyperformance benchmarks[2]. I also tried various threshold with this benchmark and 16 still seems to be the sweet spot. The geometric mean of the relative changes across all ben

[issue32534] Speed-up list.insert

2018-01-11 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: In your benchmarks the difference between thresholds is only 4%. It would be not worth to keep a special case for such small benefit. But note that in your benchmarks you inserted in a list with the size up to 5 elements. The value of the threshold affe

[issue32534] Speed-up list.insert

2018-01-11 Thread Jeethu Rao
Jeethu Rao added the comment: I tried it with a couple of different thresholds, twice each, ignoring the results of the first run. 16 seems to be the sweet spot. THRESHOLD = 0 jeethu@dev:cpython (3.7_list_insert_memmove)$ ./python -m timeit -s "l = []" "for _ in range(100): l.insert(0, None)

[issue32534] Speed-up list.insert

2018-01-11 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: What are results when set the threshold to 0 or 1? -- nosy: +serhiy.storchaka ___ Python tracker ___ ___

[issue32534] Speed-up list.insert

2018-01-11 Thread Jeethu Rao
Change by Jeethu Rao : -- keywords: +patch pull_requests: +5017 stage: -> patch review ___ Python tracker ___ ___ Python-bugs-list m

[issue32534] Speed-up list.insert

2018-01-11 Thread Jeethu Rao
New submission from Jeethu Rao : I've noticed that replacing the for loop in the ins1 function in listobject.c with a memmove when the number of pointers to move is greater than 16 seems to speed up list.insert by about 3 to 4x on a contrived benchmark. # Before jeethu@dev:cpython (master)$ .