Re: Applying code layout optimization to postgresql16 RPMs in Fedora 41 gave a 3%-6% improvement in IPC

Charalampos Stratakis Mon, 03 Feb 2025 18:21:32 -0800

On Sat, Feb 1, 2025 at 1:49 PM Miro Hrončok <mhron...@redhat.com> wrote:


> On 01. 02. 25 2:10, William Cohen wrote:
> > On 12/5/24 9:45 AM, Miro Hrončok wrote:
> >> On 04. 12. 24 20:32, William Cohen wrote:
> >>> On 11/21/24 17:32, Miro Hrončok wrote:
> >>>> On 21. 11. 24 23:11, William Cohen wrote:
> >>>>> Sediment has been designed to work with the RPM build process.
> >>>>> Currently, one needs to use modified RPM macros.  These can be
> created
> >>>>> quickly by writing the output of the sediment make_sediment_rpmmacros
> >>>>> command into ~/.rpmmacros.  One will also need to define set the pgo
> >>>>> macro to 1 for the rpmbuild process.  The rpm spec file has minimal
> >>>>> modifications.  It has the callgraph files stored as a source file
> and
> >>>>> a defines the global call_graph to the source file that holds the
> call
> >>>>> graph.
> >>>>
> >>>> Hey Will,
> >>>>
> >>>> let's say I wan to try this for Python. Where do I start? The README
> on https://github.com/wcohen/sediment is not very helpful.
> >>>>
> >>>> This is what I did based on your email:
> >>>>
> >>>> $ sudo dnf --enable-repo=updates-testing install sediment
> >>>> ...
> >>>> Installing sediment-0:0.9.3-1.fc41.noarch
> >>>>
> >>>> I run make_sediment_rpmmacros, it gives me some macros. Now I am
> supposed to put those to ~/.rpmmacros. Exccept I never build Python loclly,
> I use Koji or mock. I can probably amend this to use %global and insert it
> to python3.14.spec. But what else I need to do? Do you have a step by step
> kind of document I can follow?
> >>>>
> >>>
> >>>
> >>> Hi Miro,
> >>>
> >>> The tooling doesn't yet fit your work flow of building packages in
> >>> koji and mock.  I am looking into ways of addressing that issue.
> >>>
> >>> I an earlier email I mentioned the important thing was have good
> >>> profiling data.  Do you have suggestions on some benchmarks that would
> >>> properly exercise the python interpreter?  I have used pyperformance
> >>> (https://github.com/python/pyperformance) to get some call graph data
> >>> for python and added that to a python3.13 srpm available at
> >>> https://koji.fedoraproject.org/koji/taskinfo?taskID=126526066.  Note
> >>> Koji is NOT building code layout optimization.  One would still need
> >>> to build locally python3.13-3.13.0-1.fc41.src.rpm with sediment-0.9.4
> >>> (https://koji.fedoraproject.org/koji/buildinfo?buildID=2596791)
> >>> installed and ~/.rpmmacros following steps:
> >>>
> >>>      make_sediment_rpmmacros > ~/.rpmmacros
> >>>      rpm -Uvh python3.13-3.13.0-1.fc41.src.rpm
> >>>      cd ~/rpmbuild/SPECS
> >>>      rpmbuild -ba --define "pgo 1" python3.13.spec
> >>>
> >>> The notable difference in the python3.13.spec file is the addition of:
> >>>
> >>> # Call graph information
> >>> SOURCE12: perf_pybenchmark.gv
> >>> %global call_graph %{SOURCE12}
> >>>
> >>> The perf_pybenchmark.gv was generated with steps:
> >>>
> >>>      python3 -m pip install pyperformance
> >>>      perf record -e branches:u -j any_call -o perf_pybenchmark.data
> pyperformance run -f -o fc41_x86_python_baseline.json
> >>>      perf report -i perf_pybenchmark.data --no-demangle
> --sort=comm,dso_from,symbol_from,dso_to,symbol_to > perf_pybenchmark.out
> >>>      perf2gv < perf_pybenchmark.out > perf_pybenchmark.gv
> >>>
> >>> Added the file to the python srpm:
> >>>
> >>>      cp  perf_pybenchmark.gv ~/rpmbuild/SOURCES/.
> >>>      # edit ~/rpmbuild/SPECS/python3.13.spec to add call graph info
> >>>      The improvements were mixed between the code layout optimized
> python
> >>> and the baseline version of the pyperformance benchmarks.  This can be
> >>> seen in the attached python_pgo.out generated by:
> >>>
> >>>      python3 -m pyperf compare_to fc41_x86_python_baseline.json
> fc41_x86_python_pgo.json --table > python_pgo.out
> >>>
> >>> It looks like a number of the benchmarks are microbenchmarks that are
> >>> unlikely the benefit much for the code layout optimizations.
> >>>
> >>> Are there other python performance tests that you would suggest that
> >>> have have larger footprint and would better gauge the possible
> >>> performance improvement from the code layout optimization?
> >>>
> >>> Are there better python code examples to collect profiling data on?
> >> Hey Will,
> >>
> >> thanks for looking into this.
> >>
> >> For your question: Upstream is using this for PGO:
> >>
> >>    $ python3.14 -m test --pgo
> >>
> >> Or:
> >>
> >>    $ python3.14 -m test --pgo-extended
> >>
> >> In spec, this can be used:
> >>
> >>    LD_LIBRARY_PATH=./build/optimized ./build/optimized/python -m test
> ...
> >>
> >> ---
> >>
> >> What is the blocker to run this in Koji/mock?
> >>
> >> You do `make_sediment_rpmmacros > ~/.rpmmacros`.
> >>
> >> What's the issue with %defining such macros at spec level?
> >>
> >
> > Hi,
> >
> > I was able to do some experiments with the koji/mock buildable
> python3.13-3.13.0-1.fc41_opt.src.rpm (
> https://koji.fedoraproject.org/koji/taskinfo?taskID=128437060) and get
> better measurements of the performance impact With vstinner's suggestions
> for doing profiling of python. On a Lenovo P51 laptop running Fedora 41 I
> built two versions of rpms. Training data collected on pyperformance run
> and analyzed using sediment tool with:
> >
> >     python3 -m pip install pyperformance
> >     perf record -e branches:u -j any_call -o perf_pybenchmark.data
> pyperformance run -f -o fc41_x86_python_baseline.json
> >     perf report -i perf_pybenchmark.data --no-demangle
> --sort=comm,dso_from,symbol_from,dso_to,symbol_to > perf_pybenchmark.out
> >     perf2gv < perf_pybenchmark.out > perf_pybenchmark.gv
> >
> > Installed the srpm, went into the SPECS directory, and built the code
> layout optimized RPMs (have an added _opt in the names) with:
> >
> >     rpm -Uvh python3.13-3.13.0-1.fc41_opt.src.rpm
> >     cd ~/rpmbuild/SPECS
> >     rpmbuild -ba python3.13.spec
> >
> > Built RPMs without the code-layout optimization (no _opt in the RPM
> names):
> >
> >    rpmbuild --without opt -ba python3.13.spec
> >
> > Installed the code-layout RPMs, set up the environment for benchmarking,
> and ran the tests:
> >
> >    sudo dnf install ~/rpmbuild/RPMS/x86_64/python*fc41_opt*
> ~/rpmbuild/RPMS/noarch/python-unversioned-command-3.13.0-1.fc41_opt.noarch.rpm
> >    sudo python3 -m pyperf system tune
> >    pyperformance run -f -o fc41_x86_python_opt20250131.json >&
> fc41_pybench_opt_20250131.log
> >
> > Then collected data for the non-optimized version of the rpms:
> >
> > sudo dnf install ~/rpmbuild/RPMS/x86_64/python*fc41.*
> ~/rpmbuild/RPMS/noarch/python-unversioned-command-3.13.0-1.fc41.noarch.rpm
> > sudo python3 -m pyperf system tune
> > pyperformance run -f -o fc41_x86_python_20250131.json >&
> fc41_pybench_20250131.log
> >
> > Once done compared the data between the runs with:
> >
> >   python3 -m pyperf compare_to fc41_x86_python_20250131.json
> fc41_x86_python_opt20250131.json --table > python_opt.out
> >
> > Below is the comparison between the two versions python_opt.out). For
> the vast majority of the benchmarks the optimized code is slightly faster
> typically (1%).  The regex_* benchmarks appeared to be the largest benefit
> with regex_dna being 1.04x faster. There are several benchmarks that are
> slightly slower, pickle, pickle_dict, create_gc_cycles, spectral_norm, and
> typing_runtime_protocols.  The unpack_sequence was the worst, being 1.12x
> slower for the optimized code.  The improvements are not as noticeable as
> what was seen with postgresql.  I suspect that this might be due to the
> pyperformance has microbenchmarks and is not putting as much pressure on
> the iTLB as the large postgresql binary.
>
> Thank you, Will!
>
> I've CC'ed Charalampos, who is now looking into Python performance in
> Fedora+EL.
>
> --
> Miro Hrončok
> --
> Phone: +420777974800
> Fedora Matrix: mhroncok
>
>
That's interesting actually although the main issue would be to gather
representative perf data which Python supports
<https://docs.python.org/3/howto/perf_profiling.html> since 3.12+. I
suspect the tests run for pgo would make for an interesting case here but
it will require some experimentation.

William, do I understand correctly that sediment uses the profile data to
reorder the functions with "--section-ordering-file"? If so, could it be
used in conjunction with AutoFDO (aka same profiling data)? Also any
conflicts or issues that you might have encountered with LTO and/or PLO?

I'd have a look down the line when I get some free cycles.

Side note: You mention in your docs the GCC python plugin, however it has
not seen any active development for a long time, the data extracted from
there can be inconclusive.
-- 
Regards,

Charalampos Stratakis
Senior Software Engineer
Python Maintenance Team, Red Hat

-- 
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Re: Applying code layout optimization to postgresql16 RPMs in Fedora 41 gave a 3%-6% improvement in IPC

Reply via email to