Ahmed Karaman <ahmedkhaledkara...@gmail.com> writes:
> Hi, > > The second report of the TCG Continuous Benchmarking series builds > upon the QEMU performance metrics calculated in the previous report. > This report presents a method to dissect the number of instructions > executed by a QEMU invocation into three main phases: > - Code Generation > - JIT Execution > - Helpers Execution > It devises a Python script that automates this process. > > After that, the report presents an experiment for comparing the > output of running the script on 17 different targets. Many conclusions > can be drawn from the results and two of them are discussed in the > analysis section. A couple of comments. One think I think is missing from your analysis is the total number of guest instructions being emulated. As you point out each guest will have different code efficiency in terms of it's generated code. Assuming your test case is constant execution (i.e. runs the same each time) you could run in through a plugins build to extract the number of guest instructions, e.g.: ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d plugin ./tests/tcg/aarch64-linux-user/sha1 SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 insns: 158603512 I should have also pointed out in your last report that running FP heavy code will always be biased towards helper/softfloat code to the detriment of everything else. I think you need more of a mix of benchmarks to get a better view. When Emilio did the last set of analysis he used a suite he built out of nbench and a perl benchmark: https://github.com/cota/dbt-bench As he quoted in his README: NBench programs are small, with execution time dominated by small code loops. Thus, when run under a DBT engine, the resulting performance depends almost entirely on the quality of the output code. The Perl benchmarks compile Perl code. As is common for compilation workloads, they execute large amounts of code and show no particular code execution hotspots. Thus, the resulting DBT performance depends largely on code translation speed. by only having one benchmark you are going to miss out on the envelope of use cases. > > Report link: >https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/ > > Previous reports: > Report 1 - Measuring Basic Performance Metrics of QEMU: > https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html > > Best regards, > Ahmed Karaman -- Alex Bennée