On 12/09/2017 16:56, Thomas Huth wrote:
> The problem is that the SLOF firmware just performs very badly with TCG
> (it's fine on real hardware). It executes a lot of Forth code, and the
> Forth interpreter uses things like computed gotos or other tricks that
> basically prevent proper JIT operation here. I've done quite a bit of
> optimizations in SLOF in the past already, but I've got hardly any ideas
> left how to fix that further.

Two ideas for QEMU based on a quick "perf record" test:

- 25% of the time is spent in cpu_exec.  PPC doesn't use
tcg_gen_lookup_and_goto_ptr.  The main thing to be careful about is
that, whenever an interrupt is pending (e.g. after enabling them) you
need to force an exit to the loop.  See for example commits b29fd33db5
("target/arm: use DISAS_EXIT for eret handling", 2017-07-17) and
b74cddcbf6 ("target/mips: Use BS_EXCP where interrupts are expected",
2017-08-02).  On PPC this mostly means SPRs and env->msr writes.  Apart
from this, however, it shouldn't be hard to do.

- 8% of the time is spend in cpu_exec's call to
object_class_dynamic_cast_assert aka this line

    CPUClass *cc = CPU_GET_CLASS(cpu);

This maybe could avoid the dynamic cast.  But it's also possible that
fixing the first gets rid of this one too.

Thanks,

Paolo

Reply via email to