On 31/03/16 16:40, Paolo Bonzini wrote: > > On 31/03/2016 15:14, Sergey Fedorov wrote: >> On 30/03/16 21:13, Paolo Bonzini wrote: >>> On 30/03/2016 19:08, Sergey Fedorov wrote: >>>> The second approach is to make 'tb_invalidated_flag' per-CPU. This >>>> would be conceptually similar to what we have, but would give us thread >>>> safety. With this approach, we need to be careful to correctly clear and >>>> set the flag. >>> You can just ensure that setting and clearing it is done under tb_lock. >> So it could remain sitting in 'tcg_ctx.tb_ctx'. I'm just wondering what >> could be real benefits for making it per-CPU then? > All CPUs need to observe it in order to clear their own local next_tb > variable. It is not enough to do that once, so it has to be per-CPU.
So for each vCPU thread we have a separate flag to clear it safely. Got it, thanks. > >>> Because TranslationBlocks live in tcg_ctx.tb_ctx.tbs you need >>> special code to exit all CPUs at tb_flush time, otherwise you risk that >>> a tb_alloc reuses a TranslationBlock while it is in use by a VCPU. >> Looks like no matter which approach we use, it's ultimately necessary to >> ensure all CPUs have exited from translated code before the translation >> buffer may be safely flushed. > My plan was to use some kind of double buffering, where only half of > code_gen_buffer is in use. At the end of tb_flush you call cpu_exit() > on all CPUs, so that CPUs stop executing chained TBs from the old half > before they can see one from the new half. > > If code_gen_buffer is static you have to preallocate two buffers (and > two tbs arrays) and waste one of them; while it is theoretically > possible to have CPUs still executing from the old half while you finish > the new half, it can be more or less ignored. > > If it is dynamic, the previously used areas can be freed with call_rcu, > and you can safely allocate a new code_gen_buffer and tbs array. > > I haven't thought much about it; it might require keeping a cache of the > tbs array per CPU, and possibly changing the code under "if > (tcg_ctx.tb_ctx.tb_invalidated_flag)" to simply exit cpu_exec. Maybe save this idea for latter? :) We'd better use a simpler approach at first and then move on and optimize. BTW, a few years ago I came across an interesting paper on code cache eviction granularities [1]. [1] http://www.cs.virginia.edu/kim/courses/cs851/papers/hazelwood04mediumgrained.pdf Kind regards, Sergey