Hi Richard, On Sun, Aug 23, 2015 at 2:41 AM, Richard Henderson <r...@twiddle.net> wrote: > On Aug 22, 2015 9:45 AM, Artyom Tarasenko <atar4q...@gmail.com> wrote: >> For my test case tcg-indirect brings more performance gain than for Dennis: >> >> git master: 18m31s >> tcg-indirect: 16m50s >> #undef USE_TCG_OPTIMIZATIONS: 14m18s > > Thanks. That's useful. > >> >> >> JIT statistic, before starting the test: >> (qemu) info jit >> Translation buffer state: >> gen code size 31851136/314448896 >> TB count 128224/2457592 >> TB avg target size 18 max=704 bytes >> TB avg host size 248 bytes (expansion ratio: 13.4) >> cross page TB count 0 (0%) >> direct jump count 83840 (65%) (2 jumps=64730 50%) >> >> Statistics: >> TB flush count 5 >> TB invalidate count 317160 >> TLB flush count 1180769 >> [TCG profiler not compiled] >> >> After >> (qemu) info jit >> Translation buffer state: >> gen code size 282903344/314448896 >> TB count 1139744/2457592 >> TB avg target size 17 max=704 bytes >> TB avg host size 248 bytes (expansion ratio: 14.0) >> cross page TB count 0 (0%) >> direct jump count 739828 (64%) (2 jumps=569074 49%) >> >> Statistics: >> TB flush count 5 >> TB invalidate count 324362 >> TLB flush count 2050744 >> >> So, TB invalidate count gained only ~ 5000. >> Yet tcg_optimize is ~7% in the perf top, and tcg_liveness_analysis >> ~3%. Why do we translate so much? > > I don't know. It must be something SPARC specific, as I don't see so much > for alpha.
After some debugging I think it's caused by memory faults. On every MMU miss / access fault TB is re-translated multiple times till the faulting instruction is found. This happens gen_intermediate_code_internal when it's called with spc==true. AFAICT we produce data/access faults only on load/store instructions, i.e. if GET_FIELD(insn, 0, 1) == 3. Can this knowledge be used to reduce the number of re-translations? Artyom -- Regards, Artyom Tarasenko SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu