Re: Applying code layout optimization to postgresql16 RPMs in Fedora 41 gave a 3%-6% improvement in IPC

William Cohen Thu, 21 Nov 2024 16:01:42 -0800

On 11/21/24 17:32, Miro Hrončok wrote:
> On 21. 11. 24 23:11, William Cohen wrote:
>> Sediment has been designed to work with the RPM build process.
>> Currently, one needs to use modified RPM macros.  These can be created
>> quickly by writing the output of the sediment make_sediment_rpmmacros
>> command into ~/.rpmmacros.  One will also need to define set the pgo
>> macro to 1 for the rpmbuild process.  The rpm spec file has minimal
>> modifications.  It has the callgraph files stored as a source file and
>> a defines the global call_graph to the source file that holds the call
>> graph.
> 
> Hey Will,
> 
> let's say I wan to try this for Python. Where do I start? The README on 
> https://github.com/wcohen/sediment is not very helpful.
> 
> This is what I did based on your email:
> 
> $ sudo dnf --enable-repo=updates-testing install sediment
> ...
> Installing sediment-0:0.9.3-1.fc41.noarch
> 
> I run make_sediment_rpmmacros, it gives me some macros. Now I am supposed to 
> put those to ~/.rpmmacros. Exccept I never build Python loclly, I use Koji or 
> mock. I can probably amend this to use %global and insert it to 
> python3.14.spec. But what else I need to do? Do you have a step by step kind 
> of document I can follow?
>



Hi, Miro,

For the time being the builds need to have the macros defined, so koji isn't 
going to work.  You might be able to modify the RPM that provides the macros 
and have a mock environment that used that special RPM.

It might be more productive to take some time to define what would provide good 
representative data for python.  If the profiling data is not representative, 
then the code layout changes are unlikely to provide much performance 
improvement.  For the postgresql16 example the training was done on pgbench.  
The postgres binary is pretty large, 9.4MB in size and there appears to be a 
fair amount of jumping around in the code.  The code layout optimization reduce 
that.  For python I was thinking that might be able to train on something that 
does python benchmarking.  The microbenchmark of tight loops might not show 
much improvement, but maybe there could be some other applications that could 
be used to compare the normal vs optimized.

So far I have only collected data on x86_64 bare metal machines using something 
like the following(you might need to adjust the kernel setting to allow a 
normal user to collect data) :

  perf record -e branches:u -j any_call python_executable_under_test

Once you have the data you can convert into a form that sediment can use with:

  perf report --no-demangle --sort=comm,dso_from,symbol_from,dso_to,symbol_to > 
python_pgo_data.out

Then convert it into the actual call graph file with:

  perf2gv < python_pgo_data.out > python_pgo_data.gv

The python_pgo_data.gv file would be the file used for the python creation.  
You can also convert it into something more human readable that can be viewed 
in the browser with:

  dot -Tsvg -o python_pgo_data.svg python_pgo_data.gv

Taking a look at the svg file might give some ideas what hare the hot and cold 
paths in the code.   The ovals are the individual functions (there could be 
inlined functions in those, but we don't care about those).  The directed edge 
mark calls between the different functions.  Each edge is also labelled with a 
relative probability of edge being taken.  The edge weights are used by 
sediment to figure out which functions should be grouped together.  All the 
edges added together should add up to 1.  This normalization makes it a bit 
easier to combine multiple callgraphs together.  The large rectangle around a 
group of functions is the binary name.  I expect that you are going to be 
looking at the shared libary libpython3.14.so.1.0 for optimization.

Once you have a python_pgo_data.gv file.  Create the .rpmmacros file:

  make_sediment_rpmmacros > ~/.rpmmacros

Edit the python.spec file to include the python_pgo_data.gv file :

  SOURCE99: python_pgo_data.gv
  %global call_graph %{SOURCE99}

Then build it with 

  rpmbuild -ba --define "pgo 1" python3.14.spec

This should generate RPMs with _pgo in the RPM name.
To build baseline examples without the layout optimization:

  rpmbuild -ba python3.14.spec

I certainly would like to help getting optimized version of python built and am 
happy to help work through the issues. Let me know if you have any other 
questions or if there are things that I can improve.  

-Will

-- 
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Re: Applying code layout optimization to postgresql16 RPMs in Fedora 41 gave a 3%-6% improvement in IPC

Reply via email to