On 29/06/2022 00:17, Alex Bennée wrote:
If you run the sync-profiler (via the HMP "sync-profile on") you can
then get a breakdown of which mutex's are being held and for how long
("info sync-profile").


Alex, a huge thank you!

For the record, the "info sync-profile" showed:
Type Object Call site Wait Time (s) Count Average (us)
--------------------------------------------------------------------------------------------------
BQL mutex 0x55eb89425540 accel/tcg/cpu-exec.c:744 96.31578 73589937 1.31 BQL mutex 0x55eb89425540 target/ppc/helper_regs.c:207 0.00150 1178 1.27


And it points to a lock in the interrupt delivery path, in cpu_handle_interrupt().

I now understand the root cause. The interrupt signal for the decrementer interrupt remains set because the interrupt is not being delivered, per the config. I'm not quite sure what the proper fix is yet (there seems to be several implementations of the decrementer on ppc), but at least I understand why we are so slow.

With a quick hack, I could verify that by moving that signal out of the way, the decompression time of the kernel is now peanuts, no matter the number of cpus. Even with one cpu, the 15 seconds measured before was already a huge waste, so it was not really a multiple-cpus problem. Multiple cpus were just highlighting it.

Thanks again!

  Fred

Reply via email to