On 14/6/20, 18:05, "Mittal, Anuj" <anuj.mit...@intel.com> wrote: > On Fri, 2020-06-12 at 21:28 +0000, Ryan Rowe wrote: > > Hello Alex, > > > > I’m investigating Python 3 performance issues on a Raspberry Pi Yocto > > build; I appreciate any insights you can provide into the problem. > > > > In my investigation, I noticed that PGO was disabled in all cases due > > to a small bug. I fixed it in a patch submitted to OE-Core (#139459). > > Even when PGO is indeed enabled, Python 3 runs significantly slower > > on Yocto-compiled Python 3.8.3 than the same version compiled on > > Raspbian. > > > > In your patch, 0001-Makefile.pre-use-qemu-wrapper-when-gathering- > > profile.patch, I see that you override the default PROFILE_TASK, > > which did not explicitly specify test suites, to a command that > > explicitly provides test suites. How did you decide on these tests? > > The standard PGO command runs 43 tests, while you specify 7. When I > > compile Python 3.8.3 on Raspbian, I see no intersection between the > > 43 tests run by default and the 7 you specify. Additionally, the > > default module for PROFILE is test while you use test.regrtest. > > We used to run pybench and then switched to regrtest: > > https://git.yoctoproject.org/cgit/cgit.cgi/poky/commit/?id=d9f7b9d3ad44195e68b2c1b09e3eb42e623c9a20 > > The PROFILE_TASK value it looks like was changed recently: > > https://github.com/python/cpython/commit/2406672984e4c1b18629e615edad52928a72ffcc#diff-45e8b91057f0c5b60efcb5944125b585 > > If the performance is actually degrading, may be we should change it to > something more useful. Do you know much time does the default set of > tasks take to run in qemu? > > Thanks, > > Anuj
Thanks for looking into this. It took me about 20 minutes to run the PGO tests and I did notice a significant improvement in Python runtime. However, that is compared against a non-PGO build. I have not compared the existing PGO arguments against the new upstream arguments. We've come to realize that our performance issues are not due to Python, but in fact a much deeper rooted issue. Simple C code takes 2-3 times longer to run on our image based on meta-raspberrypi's raspberrypi4 machine than stock Raspbian. On a side node, it seems that cPython now exposes PROFILE_TASK as a configuration option, so we can override that variable with our desired profiling arguments rather than modifying the Makefile directly with a patch. Thanks, Ryan > > > > For reference, here’s the results of a simple CPU-bound test. These > > tests were run on the same Raspberry Pi 4 with same SD card. > > > > python3 -m timeit -r 10 --setup ' > > def fib(n): > > if n < 2: > > return n > > if n == 2: > > return 1 > > return fib(n - 1) + fib(n - 2) > > ' '[fib(n) for n in range(20)]' > > > > # Yocto Python 3.8.3 > > # 10 loops, best of 10: 28.9 msec per loop > > # 10 loops, best of 10: 29.3 msec per loop > > # 10 loops, best of 10: 27.9 msec per loop > > # 10 loops, best of 10: 30.4 msec per loop > > # Average result: 31.625 msec per loop > > > > # Raspbian Python 3.8.3 > > # 50 loops, best of 10: 7.73 msec per loop > > # 50 loops, best of 10: 7.72 msec per loop > > # 50 loops, best of 10: 7.67 msec per loop > > # 50 loops, best of 10: 7.74 msec per loop > > # Average result: 7.715 msec per loop > > > > # Raspbian speedup: 4.09x > > > > Best, > > Ryan Rowe > >
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#139536): https://lists.openembedded.org/g/openembedded-core/message/139536 Mute This Topic: https://lists.openembedded.org/mt/74848490/21656 Group Owner: openembedded-core+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-