On Thu, Jun 18, 2020 at 4:25 PM Khem Raj <raj.k...@gmail.com> wrote: > > On Monday, June 15, 2020 1:33:26 PM PDT Ryan Rowe wrote: > > On 14/6/20, 18:05, "Mittal, Anuj" <anuj.mit...@intel.com> wrote: > > > > > On Fri, 2020-06-12 at 21:28 +0000, Ryan Rowe wrote: > > > > > > > Hello Alex, > > > > > > > > > > > > > > > > I’m investigating Python 3 performance issues on a Raspberry Pi Yocto > > > > build; I appreciate any insights you can provide into the problem. > > > > > > > > > > > > > > > > In my investigation, I noticed that PGO was disabled in all cases due > > > > to a small bug. I fixed it in a patch submitted to OE-Core (#139459). > > > > Even when PGO is indeed enabled, Python 3 runs significantly slower > > > > on Yocto-compiled Python 3.8.3 than the same version compiled on > > > > Raspbian. > > > > > > > > > > > > > > > > In your patch, 0001-Makefile.pre-use-qemu-wrapper-when-gathering- > > > > profile.patch, I see that you override the default PROFILE_TASK, > > > > which did not explicitly specify test suites, to a command that > > > > explicitly provides test suites. How did you decide on these tests? > > > > The standard PGO command runs 43 tests, while you specify 7. When I > > > > compile Python 3.8.3 on Raspbian, I see no intersection between the > > > > 43 tests run by default and the 7 you specify. Additionally, the > > > > default module for PROFILE is test while you use test.regrtest. > > > > > > > > > > > > We used to run pybench and then switched to regrtest: > > > > > > > > > > > > https://git.yoctoproject.org/cgit/cgit.cgi/poky/commit/?id=d9f7b9d3ad44195 > > > e68b2c1b09e3eb42e623c9a20 > > > > > > > > > > > The PROFILE_TASK value it looks like was changed recently: > > > > > > > > > > > > https://github.com/python/cpython/commit/2406672984e4c1b18629e615edad52928 > > > a72ffcc#diff-45e8b91057f0c5b60efcb5944125b585 > > > > > > > > > > > If the performance is actually degrading, may be we should change it to > > > something more useful. Do you know much time does the default set of > > > tasks take to run in qemu? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Anuj > > > > > > Thanks for looking into this. It took me about 20 minutes to run the PGO > > tests and I did notice a significant improvement in Python runtime. > > However, that is compared against a non-PGO build. I have not compared > > the existing PGO arguments against the new upstream arguments. > > > > We've come to realize that our performance issues are not due to Python, > > but in fact a much deeper rooted issue. Simple C code takes 2-3 times > > longer to run on our image based on meta-raspberrypi's raspberrypi4 > > machine than stock Raspbian. > > > > On a side node, it seems that cPython now exposes PROFILE_TASK as a > > configuration option, so we can override that variable with our > > desired profiling arguments rather than modifying the Makefile > > directly with a patch. > > > > The patch 0001-Makefile.pre-use-qemu-wrapper-when-gathering-profile.patch > seems to hardcode what tests to run, perhaps it will be better to use > PROFILE_TASK > > When 3.5 -> 3.7 upgrade was done in > > https://git.openembedded.org/openembedded-core/commit/? > id=02714c105426b0d687620913c1a7401b386428b6 > > it dropped using PYTHON3_PROFILE_TASK silently, among large swath of changes > this patch carried. I guess we have not checked the py3 runtime performance to > detect this regression.
Are we sure there is a regression? Ryan posted a follow up saying everything was slower in his tests, not just python. > so it will be good to reinstate the variable to choose what tests one wants to > run with defaults being whatever is optimal for autobuilder. > > > Thanks, > > Ryan > > > > > > > > > > > > > > > > For reference, here’s the results of a simple CPU-bound test. These > > > > tests were run on the same Raspberry Pi 4 with same SD card. > > > > > > > > > > > > > > > > python3 -m timeit -r 10 --setup ' > > > > def fib(n): > > > > > > > > if n < 2: > > > > > > > > return n > > > > > > > > if n == 2: > > > > > > > > return 1 > > > > > > > > return fib(n - 1) + fib(n - 2) > > > > > > > > ' '[fib(n) for n in range(20)]' > > > > > > > > > > > > > > > > # Yocto Python 3.8.3 > > > > # 10 loops, best of 10: 28.9 msec per loop > > > > # 10 loops, best of 10: 29.3 msec per loop > > > > # 10 loops, best of 10: 27.9 msec per loop > > > > # 10 loops, best of 10: 30.4 msec per loop > > > > # Average result: 31.625 msec per loop > > > > > > > > > > > > > > > > # Raspbian Python 3.8.3 > > > > # 50 loops, best of 10: 7.73 msec per loop > > > > # 50 loops, best of 10: 7.72 msec per loop > > > > # 50 loops, best of 10: 7.67 msec per loop > > > > # 50 loops, best of 10: 7.74 msec per loop > > > > # Average result: 7.715 msec per loop > > > > > > > > > > > > > > > > # Raspbian speedup: 4.09x > > > > > > > > > > > > > > > > Best, > > > > Ryan Rowe > > > > > > > > > > > > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#139676): https://lists.openembedded.org/g/openembedded-core/message/139676 Mute This Topic: https://lists.openembedded.org/mt/74848490/21656 Group Owner: openembedded-core+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-