On Thu, Jun 18, 2020 at 4:25 PM Khem Raj <raj.k...@gmail.com> wrote:
>
> On Monday, June 15, 2020 1:33:26 PM PDT Ryan Rowe wrote:
> > On 14/6/20, 18:05, "Mittal, Anuj" <anuj.mit...@intel.com> wrote:
> >
> > > On Fri, 2020-06-12 at 21:28 +0000, Ryan Rowe wrote:
> > >
> > > > Hello Alex,
> > > >
> > > >
> > > >
> > > > I’m investigating Python 3 performance issues on a Raspberry Pi Yocto
> > > > build; I appreciate any insights you can provide into the problem.
> > > >
> > > >
> > > >
> > > > In my investigation, I noticed that PGO was disabled in all cases due
> > > > to a small bug. I fixed it in a patch submitted to OE-Core (#139459).
> > > > Even when PGO is indeed enabled, Python 3 runs significantly slower
> > > > on Yocto-compiled Python 3.8.3 than the same version compiled on
> > > > Raspbian.
> > > >
> > > >
> > > >
> > > > In your patch, 0001-Makefile.pre-use-qemu-wrapper-when-gathering-
> > > > profile.patch, I see that you override the default PROFILE_TASK,
> > > > which did not explicitly specify test suites, to a command that
> > > > explicitly provides test suites. How did you decide on these tests?
> > > > The standard PGO command runs 43 tests, while you specify 7. When I
> > > > compile Python 3.8.3 on Raspbian, I see no intersection between the
> > > > 43 tests run by default and the 7 you specify. Additionally, the
> > > > default module for PROFILE is test while you use test.regrtest.
> > >
> > >
> > >
> > > We used to run pybench and then switched to regrtest:
> > >
> > >
> > >
> > > https://git.yoctoproject.org/cgit/cgit.cgi/poky/commit/?id=d9f7b9d3ad44195
> > > e68b2c1b09e3eb42e623c9a20
> >
> > >
> > >
> > > The PROFILE_TASK value it looks like was changed recently:
> > >
> > >
> > >
> > > https://github.com/python/cpython/commit/2406672984e4c1b18629e615edad52928
> > > a72ffcc#diff-45e8b91057f0c5b60efcb5944125b585
> >
> > >
> > >
> > > If the performance is actually degrading, may be we should change it to
> > > something more useful. Do you know much time does the default set of
> > > tasks take to run in qemu?
> > >
> > >
> > >
> > > Thanks,
> > >
> > >
> > >
> > > Anuj
> >
> >
> > Thanks for looking into this. It took me about 20 minutes to run the PGO
> > tests and I did notice a significant improvement in Python runtime.
> > However, that is compared against a non-PGO build. I have not compared
> > the existing PGO arguments against the new upstream arguments.
> >
> > We've come to realize that our performance issues are not due to Python,
> > but in fact a much deeper rooted issue. Simple C code takes 2-3 times
> > longer to run on our image based on meta-raspberrypi's raspberrypi4
> > machine than stock Raspbian.
> >
> > On a side node, it seems that cPython now exposes PROFILE_TASK as a
> > configuration option, so we can override that variable with our
> > desired profiling arguments rather than modifying the Makefile
> > directly with a patch.
> >
>
> The patch 0001-Makefile.pre-use-qemu-wrapper-when-gathering-profile.patch
> seems to hardcode what tests to run, perhaps it will be better to use
> PROFILE_TASK
>
> When 3.5 -> 3.7 upgrade was done in
>
> https://git.openembedded.org/openembedded-core/commit/?
> id=02714c105426b0d687620913c1a7401b386428b6
>
> it dropped using PYTHON3_PROFILE_TASK silently, among large swath of changes
> this patch carried. I guess we have not checked the py3 runtime performance to
> detect this regression.

Are we sure there is a regression? Ryan posted a follow up saying
everything was slower in his tests, not just python.

> so it will be good to reinstate the variable to choose what tests one wants to
> run with defaults being whatever is optimal for autobuilder.
>
> > Thanks,
> > Ryan
> >
> >
> > > >
> > > >
> > > > For reference, here’s the results of a simple CPU-bound test. These
> > > > tests were run on the same Raspberry Pi 4 with same SD card.
> > > >
> > > >
> > > >
> > > > python3 -m timeit -r 10 --setup '
> > > > def fib(n):
> > > >
> > > >  if n < 2:
> > > >
> > > >    return n
> > > >
> > > >  if n == 2:
> > > >
> > > >    return 1
> > > >
> > > >  return fib(n - 1) + fib(n - 2)
> > > >
> > > > ' '[fib(n) for n in range(20)]'
> > > >
> > > >
> > > >
> > > > # Yocto Python 3.8.3
> > > > # 10 loops, best of 10: 28.9 msec per loop
> > > > # 10 loops, best of 10: 29.3 msec per loop
> > > > # 10 loops, best of 10: 27.9 msec per loop
> > > > # 10 loops, best of 10: 30.4 msec per loop
> > > > # Average result: 31.625 msec per loop
> > > >
> > > >
> > > >
> > > > # Raspbian Python 3.8.3
> > > > # 50 loops, best of 10: 7.73 msec per loop
> > > > # 50 loops, best of 10: 7.72 msec per loop
> > > > # 50 loops, best of 10: 7.67 msec per loop
> > > > # 50 loops, best of 10: 7.74 msec per loop
> > > > # Average result: 7.715 msec per loop
> > > >
> > > >
> > > >
> > > > # Raspbian speedup: 4.09x
> > > >
> > > >
> > > >
> > > > Best,
> > > > Ryan Rowe
> > > >
> >
> >
>
>
>
>
> 
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#139676): 
https://lists.openembedded.org/g/openembedded-core/message/139676
Mute This Topic: https://lists.openembedded.org/mt/74848490/21656
Group Owner: openembedded-core+ow...@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub  
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to