Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts

Alex Bennée Mon, 29 Jun 2020 09:07:48 -0700


Ahmed Karaman <ahmedkhaledkara...@gmail.com> writes:


> Hi,
>
> The second report of the TCG Continuous Benchmarking series builds
> upon the QEMU performance metrics calculated in the previous report.
> This report presents a method to dissect the number of instructions
> executed by a QEMU invocation into three main phases:
> - Code Generation
> - JIT Execution
> - Helpers Execution
> It devises a Python script that automates this process.
>
> After that, the report presents an experiment for comparing the
> output of running the script on 17 different targets. Many conclusions
> can be drawn from the results and two of them are discussed in the
> analysis section.

A couple of comments. One think I think is missing from your analysis is
the total number of guest instructions being emulated. As you point out
each guest will have different code efficiency in terms of it's
generated code.

Assuming your test case is constant execution (i.e. runs the same each
time) you could run in through a plugins build to extract the number of
guest instructions, e.g.:

  ./aarch64-linux-user/qemu-aarch64 -plugin tests/plugin/libinsn.so -d plugin 
./tests/tcg/aarch64-linux-user/sha1
  SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
  insns: 158603512

I should have also pointed out in your last report that running FP heavy
code will always be biased towards helper/softfloat code to the
detriment of everything else. I think you need more of a mix of
benchmarks to get a better view.

When Emilio did the last set of analysis he used a suite he built out of
nbench and a perl benchmark:

  https://github.com/cota/dbt-bench

As he quoted in his README:

  NBench programs are small, with execution time dominated by small code
  loops. Thus, when run under a DBT engine, the resulting performance
  depends almost entirely on the quality of the output code.

  The Perl benchmarks compile Perl code. As is common for compilation
  workloads, they execute large amounts of code and show no particular
  code execution hotspots. Thus, the resulting DBT performance depends
  largely on code translation speed.
  
by only having one benchmark you are going to miss out on the envelope
of use cases.

>
> Report link:
>https://ahmedkrmn.github.io/TCG-Continuous-Benchmarking/Dissecting-QEMU-Into-Three-Main-Parts/
>
> Previous reports:
> Report 1 - Measuring Basic Performance Metrics of QEMU:
> https://lists.gnu.org/archive/html/qemu-devel/2020-06/msg06692.html
>
> Best regards,
> Ahmed Karaman


-- 
Alex Bennée

Re: [REPORT] [GSoC - TCG Continuous Benchmarking] [#2] Dissecting QEMU Into Three Main Parts

Reply via email to