On Mon, Jun 5, 2017 at 6:49 PM, Emilio G. Cota <c...@braap.org> wrote:
> Allocating an arbitrarily-sized array of tbs results in either
> (a) a lot of memory wasted or (b) unnecessary flushes of the code
> cache when we run out of TB structs in the array.
>
> An obvious solution would be to just malloc a TB struct when needed,
> and keep the TB array as an array of pointers (recall that tb_find_pc()
> needs the TB array to run in O(log n)).
>
> Perhaps a better solution, which is implemented in this patch, is to
> allocate TB's right before the translated code they describe. This
> results in some memory waste due to padding to have code and TBs in
> separate cache lines--for instance, I measured 4.7% of padding in the
> used portion of code_gen_buffer when booting aarch64 Linux on a
> host with 64-byte cache lines. However, it can allow for optimizations
> in some host architectures, since TCG backends could safely assume that
> the TB and the corresponding translated code are very close to each
> other in memory. See this message by rth for a detailed explanation:
>
>   https://lists.gnu.org/archive/html/qemu-devel/2017-03/msg05172.html
>   Subject: Re: GSoC 2017 Proposal: TCG performance enhancements
>   Message-ID: <1e67644b-4b30-887e-d329-1848e94c9...@twiddle.net>

Reviewed-by: Pranith Kumar <bobby.pr...@gmail.com>

Thanks for doing this Emilio. Do you plan to continue working on rth's
suggestions in that email? If so, can we co-ordinate our work?

-- 
Pranith

Reply via email to