> -----Original Message----- > From: Richard Henderson <richard.hender...@linaro.org> > Sent: Monday, April 18, 2022 10:38 AM > To: Taylor Simpson <tsimp...@quicinc.com>; qemu-devel@nongnu.org > Cc: Philippe Mathieu-Daudé <f4...@amsat.org> > Subject: Re: Question about direct block chaining > > On 4/18/22 07:54, Taylor Simpson wrote: > > I implemented both approaches for inner loops and didn't see speedup > > in my benchmark. So, I have a couple of questions > > 1) What are the pros and cons of the two approaches > (lookup_and_goto_ptr and goto_tb + exit_tb)? > > goto_tb can only be used within a single page (plus other restrictions, see > translator_use_goto_tb). In addition, as documented, the change in cpu > state must be constant, beginning with a direct jump. > > lookup_and_goto_ptr can handle any change in cpu state, including indirect > jumps. > > > > 2) How can I verify that direct block chaining is working properly? > > With -d exec, I see lines like the following with goto_tb + exit_tb > > but > NOT lookup_and_goto_ptr > > Linking TBs 0x7fda44172e00 [0050ac38] index 1 -> 0x7fda44173b40 > > [0050ac6c] > > Well, that's one way. I would have also suggested simply looking at -d op > output, for the various branchy cases you're considering, to see that all of > the > exits are as expected.
Thanks!! I created a synthetic benchmark with a loop with a very small body and a very high number of iterations. I can see differences in execution time. Here are my observations: - goto_tb + exit_tb gives the fastest execution time because it will patch the native jump address - lookup_and_goto_ptr is an improvement over tcg_gen_exit_tb(NULL, 0) Taylor