Re: Slowness with multi-thread TCG?

Matheus K. Ferst Wed, 29 Jun 2022 09:26:42 -0700

On 29/06/2022 12:36, Frederic Barrat wrote:

[E-MAIL EXTERNO] Não clique em links ou abra anexos, a menos que vocêpossa confirmar o remetente e saber que o conteúdo é seguro. Em caso dee-mail suspeito entre imediatamente em contato com o DTI.


On 29/06/2022 00:17, Alex Bennée wrote:

If you run the sync-profiler (via the HMP "sync-profile on") you can
then get a breakdown of which mutex's are being held and for how long
("info sync-profile").



Alex, a huge thank you!

For the record, the "info sync-profile" showed:
Type               Object  Call site                     Wait Time (s)
        Count  Average (us)

--------------------------------------------------------------------------------------------------

BQL mutex  0x55eb89425540  accel/tcg/cpu-exec.c:744           96.31578
     73589937          1.31
BQL mutex  0x55eb89425540  target/ppc/helper_regs.c:207        0.00150
         1178          1.27


And it points to a lock in the interrupt delivery path, in
cpu_handle_interrupt().

I now understand the root cause. The interrupt signal for the
decrementer interrupt remains set because the interrupt is not being
delivered, per the config. I'm not quite sure what the proper fix is yet
(there seems to be several implementations of the decrementer on ppc),
but at least I understand why we are so slow.


To summarize what we talked elsewhere:

1 - The threads that are not decompressing the kernel have a pendingPPC_INTERRUPT_DECR, and cs->interrupt_request is CPU_INTERRUPT_HARD;2 - cpu_handle_interrupt calls ppc_cpu_exec_interrupt, that callsppc_hw_interrupt to handle the interrupt;3 - ppc_cpu_exec_interrupt decides that the interrupt cannot bedelivered immediately, so the corresponding bit inenv->pending_interrupts is not reset;4 - ppc_cpu_exec_interrupt does not change cs->interrupt_request becausepending_interrupts != 0, so cpu_handle_interrupt will be called again.

This loop will acquire and release qemu_mutex_lock_iothread, slowingdown other threads that need this lock.

With a quick hack, I could verify that by moving that signal out of the
way, the decompression time of the kernel is now peanuts, no matter the
number of cpus. Even with one cpu, the 15 seconds measured before was
already a huge waste, so it was not really a multiple-cpus problem.
Multiple cpus were just highlighting it.

Thanks again!

   Fred

--
Matheus K. Ferst
Instituto de Pesquisas ELDORADO <http://www.eldorado.org.br/>
Analista de Software
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>

Re: Slowness with multi-thread TCG?

Reply via email to