On Mon, Jun 5, 2017 at 6:49 PM, Emilio G. Cota <c...@braap.org> wrote: > Allocating an arbitrarily-sized array of tbs results in either > (a) a lot of memory wasted or (b) unnecessary flushes of the code > cache when we run out of TB structs in the array. > > An obvious solution would be to just malloc a TB struct when needed, > and keep the TB array as an array of pointers (recall that tb_find_pc() > needs the TB array to run in O(log n)). > > Perhaps a better solution, which is implemented in this patch, is to > allocate TB's right before the translated code they describe. This > results in some memory waste due to padding to have code and TBs in > separate cache lines--for instance, I measured 4.7% of padding in the > used portion of code_gen_buffer when booting aarch64 Linux on a > host with 64-byte cache lines. However, it can allow for optimizations > in some host architectures, since TCG backends could safely assume that > the TB and the corresponding translated code are very close to each > other in memory. See this message by rth for a detailed explanation: > > https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05172.html > Subject: Re: GSoC 2017 Proposal: TCG performance enhancements > Message-ID: <1e67644b-4b30-887e-d329-1848e94c9...@twiddle.net>
Reviewed-by: Pranith Kumar <bobby.pr...@gmail.com> Thanks for doing this Emilio. Do you plan to continue working on rth's suggestions in that email? If so, can we co-ordinate our work? -- Pranith