On 2015-08-19 12:41, Artyom Tarasenko wrote: > Hi Richard, > > On Tue, Aug 18, 2015 at 7:55 PM, Richard Henderson <r...@twiddle.net> wrote: > > On 08/18/2015 02:24 AM, Artyom Tarasenko wrote: > >> The unoptimized case is a sequence of multiple cmp and branch > >> operations (likely created by a "case" statement in the original > >> source code), especially where cmp is in a delay slot of a branch > >> instruction. > > > > Interesting. > > > >> I wonder whether we always have to finish a TB on a conditional jump. > >> Maybe it would make sense to translate further if a destination of a > >> jump is not too far from dc->pc? The definition of "not too far" is > >> indeed tricky. > > > > We can only handle two chained exits from a TB. If we continue past > > a conditional branch, we may well encounter a second conditional branch, > > which > > would leave us with three different exits from the TB. > > > > Something that may be interesting to play with, however, is to change the TB > > with which the insn in a delay slot is connected. > > > > For instance, we currently spend some amount of effort computing and saving > > the > > branch condition, so that we can then execute the delay slot, and afterwards > > use the saved branch condition to perform the branch. > > > > Another way of doing this is to immediately branch, exiting the TB. But we > > set > > up PC+NPC for the next TB such that the delay slot is the first insn that is > > executed within the next TB. In that way, the compare in the delay slot > > that > > you mention *is* in the same TB as the branch that uses it, allowing > > the case to be optimized. > > > > This could wind up creating more TBs than the current solution, so it's not > > clear that it would be a win. One can mitigate that somewhat by noticing > > the > > case where the delay slot is a nop. I do think it's worth an experiment. > > So it is possible to make a TB with non sequential instructions? > The instruction in the delay slot would be located most likely > elsewhere than the following instructions. > > But I think I've been chasing a red herring. I see those helpers in > perf top when running sysbench, but not when running g++ (and at the > end g++ is much more relevant benchmark for me): > > > Samples: 83K of event 'cpu-clock', Event count (approx.): 15333243164, > Thread: qemu-system-spa(2743) > 27.10% [kernel] [k] retint_signal > 12.66% qemu-system-sparc64 [.] tcg_optimize > 9.18% [vdso] [.] 0x0000000000000998 > 8.39% [kernel] [k] _raw_spin_unlock_irqrestore > 4.76% qemu-system-sparc64 [.] tcg_liveness_analysis > 3.89% qemu-system-sparc64 [.] tcg_reg_alloc_op > 2.80% qemu-system-sparc64 [.] tcg_out_opc > 2.45% qemu-system-sparc64 [.] get_physical_address_data > 1.86% [kernel] [k] native_read_tsc > 1.62% qemu-system-sparc64 [.] tlb_flush_page > 1.55% qemu-system-sparc64 [.] tcg_out_modrm_sib_offset.constprop.42 > 1.45% [unknown] [.] 0x00000000451c5cae > 1.43% qemu-system-sparc64 [.] gen_intermediate_code_pc > 1.39% qemu-system-sparc64 [.] tcg_temp_new_internal_i64 > 1.24% qemu-system-sparc64 [.] tb_flush_jmp_cache > 1.11% qemu-system-sparc64 [.] disas_sparc_insn > 1.08% qemu-system-sparc64 [.] tcg_out_modrm > 0.97% qemu-system-sparc64 [.] tcg_reg_alloc_start > 0.77% qemu-system-sparc64 [.] cpu_sparc_exec > 0.73% qemu-system-sparc64 [.] replace_tlb_1bit_lru.isra.3 > 0.72% qemu-system-sparc64 [.] tcg_gen_code_search_pc > 0.72% qemu-system-sparc64 [.] tcg_opt_gen_mov > 0.70% qemu-system-sparc64 [.] reset_temp > > I'm not sure why I still see kernel functions when I zoom into qemu > thread. Is this qemu signal handling? > And then it would be interesting to know where in this listing is the > generated code. Is it [vdso], [unknown] or is it hidden behind > retint_signal? > > Ironically a good optimization target seems to be the tcg_optimize > function. If I zoom I see it spends most of the time in > reset_all_temps. > > Any suggestions how to improve it? >
Try this patch: http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02042.html Aurelien -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net