Re: [INFO] Some preliminary performance data

Ahmed Karaman Sun, 03 May 2020 00:41:15 -0700

Thanks Mr. Aleksandar for the introduction.
I'm really looking forward to working with the QEMU developers community
this summer.
Wishing all of you health and safety.



On Sun, May 3, 2020, 1:25 AM Aleksandar Markovic <
aleksandar.qemu.de...@gmail.com> wrote:

> [correcting some email addresses]
>
> нед, 3. мај 2020. у 01:20 Aleksandar Markovic <
> aleksandar.qemu.de...@gmail.com> је написао/ла:
>
>> Hi, all.
>>
>> I just want to share with you some bits and pieces of data that I got
>> while doing some preliminary experimentation for the GSoC project "TCG
>> Continuous Benchmarking", that Ahmed Karaman, a student of the fourth final
>> year of Electical Engineering Faculty in Cairo, will execute.
>>
>> *User Mode*
>>
>>    * As expected, for any program dealing with any substantional
>> floating-point calculation, softfloat library will be the the heaviest CPU
>> cycles consumer.
>>    * We plan to examine the performance behaviour of non-FP programs
>> (integer arithmetic), or even non-numeric programs (sorting strings, for
>> example).
>>
>> *System Mode*
>>
>>    * I did profiling of booting several machines using a tool called
>> callgrind (a part of valgrind). The tool offers pletora of information,
>> however it looks it is little confused by usage of coroutines, and that
>> makes some of its reports look very illogical, or plain ugly. Still, it
>> seems valid data can be extracted from it. Without going into details, here
>> is what it says for one machine (bear in mind that results may vary to a
>> great extent between machines):
>>      ** The booting involved six threads, one for display handling, one
>> for emulations, and four more. The last four did almost nothing during
>> boot, just almost entire time siting idle, waiting for something. As far as
>> "Total Instruction Fetch Count" (this is the main measure used in
>> callgrind), they were distributed in proportion 1:3 between display thread
>> and emulation thread (the rest of threads were negligible) (but,
>> interestingly enough, for another machine that proportion was 1:20).
>>      ** The display thread is dominated by vga_update_display() function
>> (21.5% "self" time, and 51.6% "self + callees" time, called almost 40000
>> times). Other functions worth mentioning are
>> cpu_physical_memory_snapshot_get_dirty() and
>> memory_region_snapshot_get_dirty(), which are very small functions, but are
>> both invoked over 26 000 000 times, and contribute with over 20% of display
>> thread instruction fetch count together.
>>      ** Focusing now on emulation thread, "Total Instruction Fetch
>> Counts" were roughly distributed this way:
>>            - 15.7% is execution of GIT-ed code from translation block
>> buffer
>>            - 39.9% is execution of helpers
>>            - 44.4% is code translation stage, including some coroutine
>> activities
>>         Top two among helpers:
>>           - helper_le_stl_memory()
>>           - helper_lookup_tb_ptr() (this one is invoked whopping 36 000
>> 000 times)
>>         Single largest instruction consumer of code translation:
>>           - liveness_pass_1(), that constitutes 21.5% of the entire
>> "emulation thread" consumption, or, in other way, almost half of code
>> translation stage (that sits at 44.4%)
>>
>> Please take all this with a little grain of salt, since these results are
>> just of preliminary nature.
>>
>> I would like to use this opportunity to welcome Ahmed Karaman, a talented
>> young man from Egypt, into QEMU development community, that'll work on "TCG
>> Continuous Benchmarking" project this summer. Please do help them in his
>> first steps as our colleague. Best luck to Ahmed!
>>
>> Thanks,
>> Aleksandar
>>
>>

Re: [INFO] Some preliminary performance data

Reply via email to