On 14/6/20, 18:05, "Mittal, Anuj" <anuj.mit...@intel.com> wrote:
> On Fri, 2020-06-12 at 21:28 +0000, Ryan Rowe wrote:
> > Hello Alex,
> >
> > I’m investigating Python 3 performance issues on a Raspberry Pi Yocto
> > build; I appreciate any insights you can provide into the problem.
> >
> > In my investigation, I noticed that PGO was disabled in all cases due
> > to a small bug. I fixed it in a patch submitted to OE-Core (#139459).
> > Even when PGO is indeed enabled, Python 3 runs significantly slower
> > on Yocto-compiled Python 3.8.3 than the same version compiled on
> > Raspbian.
> >
> > In your patch, 0001-Makefile.pre-use-qemu-wrapper-when-gathering-
> > profile.patch, I see that you override the default PROFILE_TASK,
> > which did not explicitly specify test suites, to a command that
> > explicitly provides test suites. How did you decide on these tests?
> > The standard PGO command runs 43 tests, while you specify 7. When I
> > compile Python 3.8.3 on Raspbian, I see no intersection between the
> > 43 tests run by default and the 7 you specify. Additionally, the
> > default module for PROFILE is test while you use test.regrtest.
>
> We used to run pybench and then switched to regrtest:
>
> https://git.yoctoproject.org/cgit/cgit.cgi/poky/commit/?id=d9f7b9d3ad44195e68b2c1b09e3eb42e623c9a20
>
> The PROFILE_TASK value it looks like was changed recently:
>
> https://github.com/python/cpython/commit/2406672984e4c1b18629e615edad52928a72ffcc#diff-45e8b91057f0c5b60efcb5944125b585
>
> If the performance is actually degrading, may be we should change it to
> something more useful. Do you know much time does the default set of
> tasks take to run in qemu?
>
> Thanks,
>
> Anuj

Thanks for looking into this. It took me about 20 minutes to run the PGO
tests and I did notice a significant improvement in Python runtime.
However, that is compared against a non-PGO build. I have not compared
the existing PGO arguments against the new upstream arguments.

We've come to realize that our performance issues are not due to Python,
but in fact a much deeper rooted issue. Simple C code takes 2-3 times
longer to run on our image based on meta-raspberrypi's raspberrypi4
machine than stock Raspbian.

On a side node, it seems that cPython now exposes PROFILE_TASK as a
configuration option, so we can override that variable with our
desired profiling arguments rather than modifying the Makefile
directly with a patch.

Thanks,
Ryan

> >
> > For reference, here’s the results of a simple CPU-bound test. These
> > tests were run on the same Raspberry Pi 4 with same SD card.
> >
> > python3 -m timeit -r 10 --setup '
> > def fib(n):
> >  if n < 2:
> >    return n
> >  if n == 2:
> >    return 1
> >  return fib(n - 1) + fib(n - 2)
> > ' '[fib(n) for n in range(20)]'
> >
> > # Yocto Python 3.8.3
> > # 10 loops, best of 10: 28.9 msec per loop
> > # 10 loops, best of 10: 29.3 msec per loop
> > # 10 loops, best of 10: 27.9 msec per loop
> > # 10 loops, best of 10: 30.4 msec per loop
> > # Average result: 31.625 msec per loop
> >
> > # Raspbian Python 3.8.3
> > # 50 loops, best of 10: 7.73 msec per loop
> > # 50 loops, best of 10: 7.72 msec per loop
> > # 50 loops, best of 10: 7.67 msec per loop
> > # 50 loops, best of 10: 7.74 msec per loop
> > # Average result: 7.715 msec per loop
> >
> > # Raspbian speedup: 4.09x
> >
> > Best,
> > Ryan Rowe
> > 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#139536): 
https://lists.openembedded.org/g/openembedded-core/message/139536
Mute This Topic: https://lists.openembedded.org/mt/74848490/21656
Group Owner: openembedded-core+ow...@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub  
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to