On Sat, Feb 1, 2025 at 1:49 PM Miro Hrončok <mhron...@redhat.com> wrote:
> On 01. 02. 25 2:10, William Cohen wrote: > > On 12/5/24 9:45 AM, Miro Hrončok wrote: > >> On 04. 12. 24 20:32, William Cohen wrote: > >>> On 11/21/24 17:32, Miro Hrončok wrote: > >>>> On 21. 11. 24 23:11, William Cohen wrote: > >>>>> Sediment has been designed to work with the RPM build process. > >>>>> Currently, one needs to use modified RPM macros. These can be > created > >>>>> quickly by writing the output of the sediment make_sediment_rpmmacros > >>>>> command into ~/.rpmmacros. One will also need to define set the pgo > >>>>> macro to 1 for the rpmbuild process. The rpm spec file has minimal > >>>>> modifications. It has the callgraph files stored as a source file > and > >>>>> a defines the global call_graph to the source file that holds the > call > >>>>> graph. > >>>> > >>>> Hey Will, > >>>> > >>>> let's say I wan to try this for Python. Where do I start? The README > on https://github.com/wcohen/sediment is not very helpful. > >>>> > >>>> This is what I did based on your email: > >>>> > >>>> $ sudo dnf --enable-repo=updates-testing install sediment > >>>> ... > >>>> Installing sediment-0:0.9.3-1.fc41.noarch > >>>> > >>>> I run make_sediment_rpmmacros, it gives me some macros. Now I am > supposed to put those to ~/.rpmmacros. Exccept I never build Python loclly, > I use Koji or mock. I can probably amend this to use %global and insert it > to python3.14.spec. But what else I need to do? Do you have a step by step > kind of document I can follow? > >>>> > >>> > >>> > >>> Hi Miro, > >>> > >>> The tooling doesn't yet fit your work flow of building packages in > >>> koji and mock. I am looking into ways of addressing that issue. > >>> > >>> I an earlier email I mentioned the important thing was have good > >>> profiling data. Do you have suggestions on some benchmarks that would > >>> properly exercise the python interpreter? I have used pyperformance > >>> (https://github.com/python/pyperformance) to get some call graph data > >>> for python and added that to a python3.13 srpm available at > >>> https://koji.fedoraproject.org/koji/taskinfo?taskID=126526066. Note > >>> Koji is NOT building code layout optimization. One would still need > >>> to build locally python3.13-3.13.0-1.fc41.src.rpm with sediment-0.9.4 > >>> (https://koji.fedoraproject.org/koji/buildinfo?buildID=2596791) > >>> installed and ~/.rpmmacros following steps: > >>> > >>> make_sediment_rpmmacros > ~/.rpmmacros > >>> rpm -Uvh python3.13-3.13.0-1.fc41.src.rpm > >>> cd ~/rpmbuild/SPECS > >>> rpmbuild -ba --define "pgo 1" python3.13.spec > >>> > >>> The notable difference in the python3.13.spec file is the addition of: > >>> > >>> # Call graph information > >>> SOURCE12: perf_pybenchmark.gv > >>> %global call_graph %{SOURCE12} > >>> > >>> The perf_pybenchmark.gv was generated with steps: > >>> > >>> python3 -m pip install pyperformance > >>> perf record -e branches:u -j any_call -o perf_pybenchmark.data > pyperformance run -f -o fc41_x86_python_baseline.json > >>> perf report -i perf_pybenchmark.data --no-demangle > --sort=comm,dso_from,symbol_from,dso_to,symbol_to > perf_pybenchmark.out > >>> perf2gv < perf_pybenchmark.out > perf_pybenchmark.gv > >>> > >>> Added the file to the python srpm: > >>> > >>> cp perf_pybenchmark.gv ~/rpmbuild/SOURCES/. > >>> # edit ~/rpmbuild/SPECS/python3.13.spec to add call graph info > >>> The improvements were mixed between the code layout optimized > python > >>> and the baseline version of the pyperformance benchmarks. This can be > >>> seen in the attached python_pgo.out generated by: > >>> > >>> python3 -m pyperf compare_to fc41_x86_python_baseline.json > fc41_x86_python_pgo.json --table > python_pgo.out > >>> > >>> It looks like a number of the benchmarks are microbenchmarks that are > >>> unlikely the benefit much for the code layout optimizations. > >>> > >>> Are there other python performance tests that you would suggest that > >>> have have larger footprint and would better gauge the possible > >>> performance improvement from the code layout optimization? > >>> > >>> Are there better python code examples to collect profiling data on? > >> Hey Will, > >> > >> thanks for looking into this. > >> > >> For your question: Upstream is using this for PGO: > >> > >> $ python3.14 -m test --pgo > >> > >> Or: > >> > >> $ python3.14 -m test --pgo-extended > >> > >> In spec, this can be used: > >> > >> LD_LIBRARY_PATH=./build/optimized ./build/optimized/python -m test > ... > >> > >> --- > >> > >> What is the blocker to run this in Koji/mock? > >> > >> You do `make_sediment_rpmmacros > ~/.rpmmacros`. > >> > >> What's the issue with %defining such macros at spec level? > >> > > > > Hi, > > > > I was able to do some experiments with the koji/mock buildable > python3.13-3.13.0-1.fc41_opt.src.rpm ( > https://koji.fedoraproject.org/koji/taskinfo?taskID=128437060) and get > better measurements of the performance impact With vstinner's suggestions > for doing profiling of python. On a Lenovo P51 laptop running Fedora 41 I > built two versions of rpms. Training data collected on pyperformance run > and analyzed using sediment tool with: > > > > python3 -m pip install pyperformance > > perf record -e branches:u -j any_call -o perf_pybenchmark.data > pyperformance run -f -o fc41_x86_python_baseline.json > > perf report -i perf_pybenchmark.data --no-demangle > --sort=comm,dso_from,symbol_from,dso_to,symbol_to > perf_pybenchmark.out > > perf2gv < perf_pybenchmark.out > perf_pybenchmark.gv > > > > Installed the srpm, went into the SPECS directory, and built the code > layout optimized RPMs (have an added _opt in the names) with: > > > > rpm -Uvh python3.13-3.13.0-1.fc41_opt.src.rpm > > cd ~/rpmbuild/SPECS > > rpmbuild -ba python3.13.spec > > > > Built RPMs without the code-layout optimization (no _opt in the RPM > names): > > > > rpmbuild --without opt -ba python3.13.spec > > > > Installed the code-layout RPMs, set up the environment for benchmarking, > and ran the tests: > > > > sudo dnf install ~/rpmbuild/RPMS/x86_64/python*fc41_opt* > ~/rpmbuild/RPMS/noarch/python-unversioned-command-3.13.0-1.fc41_opt.noarch.rpm > > sudo python3 -m pyperf system tune > > pyperformance run -f -o fc41_x86_python_opt20250131.json >& > fc41_pybench_opt_20250131.log > > > > Then collected data for the non-optimized version of the rpms: > > > > sudo dnf install ~/rpmbuild/RPMS/x86_64/python*fc41.* > ~/rpmbuild/RPMS/noarch/python-unversioned-command-3.13.0-1.fc41.noarch.rpm > > sudo python3 -m pyperf system tune > > pyperformance run -f -o fc41_x86_python_20250131.json >& > fc41_pybench_20250131.log > > > > Once done compared the data between the runs with: > > > > python3 -m pyperf compare_to fc41_x86_python_20250131.json > fc41_x86_python_opt20250131.json --table > python_opt.out > > > > Below is the comparison between the two versions python_opt.out). For > the vast majority of the benchmarks the optimized code is slightly faster > typically (1%). The regex_* benchmarks appeared to be the largest benefit > with regex_dna being 1.04x faster. There are several benchmarks that are > slightly slower, pickle, pickle_dict, create_gc_cycles, spectral_norm, and > typing_runtime_protocols. The unpack_sequence was the worst, being 1.12x > slower for the optimized code. The improvements are not as noticeable as > what was seen with postgresql. I suspect that this might be due to the > pyperformance has microbenchmarks and is not putting as much pressure on > the iTLB as the large postgresql binary. > > Thank you, Will! > > I've CC'ed Charalampos, who is now looking into Python performance in > Fedora+EL. > > -- > Miro Hrončok > -- > Phone: +420777974800 > Fedora Matrix: mhroncok > > That's interesting actually although the main issue would be to gather representative perf data which Python supports <https://docs.python.org/3/howto/perf_profiling.html> since 3.12+. I suspect the tests run for pgo would make for an interesting case here but it will require some experimentation. William, do I understand correctly that sediment uses the profile data to reorder the functions with "--section-ordering-file"? If so, could it be used in conjunction with AutoFDO (aka same profiling data)? Also any conflicts or issues that you might have encountered with LTO and/or PLO? I'd have a look down the line when I get some free cycles. Side note: You mention in your docs the GCC python plugin, however it has not seen any active development for a long time, the data extracted from there can be inconclusive. -- Regards, Charalampos Stratakis Senior Software Engineer Python Maintenance Team, Red Hat
-- _______________________________________________ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue