On 12/09/2017 16:56, Thomas Huth wrote: > The problem is that the SLOF firmware just performs very badly with TCG > (it's fine on real hardware). It executes a lot of Forth code, and the > Forth interpreter uses things like computed gotos or other tricks that > basically prevent proper JIT operation here. I've done quite a bit of > optimizations in SLOF in the past already, but I've got hardly any ideas > left how to fix that further.
Two ideas for QEMU based on a quick "perf record" test: - 25% of the time is spent in cpu_exec. PPC doesn't use tcg_gen_lookup_and_goto_ptr. The main thing to be careful about is that, whenever an interrupt is pending (e.g. after enabling them) you need to force an exit to the loop. See for example commits b29fd33db5 ("target/arm: use DISAS_EXIT for eret handling", 2017-07-17) and b74cddcbf6 ("target/mips: Use BS_EXCP where interrupts are expected", 2017-08-02). On PPC this mostly means SPRs and env->msr writes. Apart from this, however, it shouldn't be hard to do. - 8% of the time is spend in cpu_exec's call to object_class_dynamic_cast_assert aka this line CPUClass *cc = CPU_GET_CLASS(cpu); This maybe could avoid the dynamic cast. But it's also possible that fixing the first gets rid of this one too. Thanks, Paolo