Jeethu Rao added the comment:
It's also interesting that in
https://gist.github.com/pitrou/29eb7592fa1eae2be390f3bfa3db0a3a :
| django_template | 307 ms| 312 ms | 1.02x slower
| Not significant|
It seems to be slower and the benchmarks before it
New submission from Jeethu Rao :
In one of patches I'm building, (yet another attempt at caching
LOAD_GLOBALS)[1], I'm using the private APIs from PEP 523 to store an array
with every code object. I'm calling _PyEval_RequestCodeExtraIndex with
PyMem_Free for the freefunc
Jeethu Rao added the comment:
> What is 54640?
That's the pid of the process.
> I'm interested to know which benchmarks call list.insert() 40k times.
The django_template benchmark.
--
___
Python tracker
<https://bugs.py
Jeethu Rao added the comment:
> > I still think those numbers are misleading or downright bogus. There is no
> > existing proof that list.insert() is a critical path in those benchmarks.
> Can someone check if these bencmarks really use list.insert() in hot code? If
> yes,
Jeethu Rao added the comment:
> FWIW, we've encountered a number of situations in the past when something
> that improved the timings on one compiler would make timings worse on another
> compiler. There was also variance between timings on 32-bit builds versus
> 64-
Change by Jeethu Rao :
--
nosy: +jeethu
___
Python tracker
<https://bugs.python.org/issue30604>
___
___
Python-bugs-list mailing list
Unsubscribe:
Change by Jeethu Rao :
--
nosy: +jeethu
___
Python tracker
<https://bugs.python.org/issue28521>
___
___
Python-bugs-list mailing list
Unsubscribe:
Jeethu Rao added the comment:
> Be careful. Moving "l.insert" lookup of the loop might make the code slower.
> I never looked why. But Python 3.7 was also optimized in many places to call
> methods, so I'm not sure anymore :)
Thanks again! Here's a gist wit
Jeethu Rao added the comment:
Victor: I’m booting with the isolcpus and rcu_nocbs flags, and running
pyperformance with the --affinity flag to pin the benchmark to the isolated CPU
cores. I’ve also run `perf system tune`. And the OS is Ubuntu 17.10. Thanks for
the tip on using perf timeit
Jeethu Rao added the comment:
Built and benchmarked both the baseline and the patch without PGO; the
differences are less pronounced, but still present.
https://gist.github.com/jeethu/abd404e39c6dfcbabb4c01661b9238d1
--
___
Python tracker
<ht
Jeethu Rao added the comment:
I rebased my branch off of master and rebuilt it, and also rebuilt the baseline
from master. Both versions were configured with --with-lto and
--enable-optimizations. The benchmark numbers are rather different this
time[1]. pidigits is slower, but nbody is still
Jeethu Rao added the comment:
I managed to tune an i7700k desktop running Ubuntu 17.10 per this doc[1], and
ran the pyperformance benchmarks[2].
I also tried various threshold with this benchmark and 16 still seems to be the
sweet spot.
The geometric mean of the relative changes across all
Jeethu Rao added the comment:
I tried it with a couple of different thresholds, twice each, ignoring the
results of the first run. 16 seems to be the sweet spot.
THRESHOLD = 0
jeethu@dev:cpython (3.7_list_insert_memmove)$ ./python -m timeit -s "l = []"
"for _ in range(100): l
Change by Jeethu Rao :
--
keywords: +patch
pull_requests: +5017
stage: -> patch review
___
Python tracker
<https://bugs.python.org/issue32534>
___
___
Python-
New submission from Jeethu Rao :
I've noticed that replacing the for loop in the ins1 function in listobject.c
with a memmove when the number of pointers to move is greater than 16 seems to
speed up list.insert by about 3 to 4x on a contrived benchmark.
# Before
jeethu@dev:cpython (m
15 matches
Mail list logo