Re: [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG.

Claudio Fontana Wed, 21 Oct 2015 08:12:13 -0700

On 07.10.2015 16:52, Frederic Konrad wrote:
> Hi Claudio,
> 
> I'll rebase soon tomorrow with a bit of luck ;).
> 
> Thanks,
> Fred


a respectful ping on this one :-)

I am looking at http://git.greensocs.com/fkonrad/mttcg.git
branch multi_tcg_v7_bugfixed,
is there something new?

Ciao,

Claudio

> 
> On 07/10/2015 14:46, Claudio Fontana wrote:
>> Hello Frederic,
>>
>> On 11.08.2015 08:27, Frederic Konrad wrote:
>>> On 11/08/2015 08:15, Benjamin Herrenschmidt wrote:
>>>> On Mon, 2015-08-10 at 17:26 +0200, fred.kon...@greensocs.com wrote:
>>>>> From: KONRAD Frederic <fred.kon...@greensocs.com>
>>>>>
>>>>> This is the 7th round of the MTTCG patch series.
>>>>>
>>>>>
>>>>> It can be cloned from:
>>>>> g...@git.greensocs.com:fkonrad/mttcg.git branch multi_tcg_v7.
>> would it be possible to rebase on latest qemu? I wonder if mttcg is 
>> diverging a bit too much from mainline,
>> which will make it more difficult to rebase later..(Or did I get confused 
>> about all these repos?)
>>
>> Thank you!
>>
>> Claudio
>>
>>>>> This patch-set try to address the different issues in the global picture 
>>>>> of
>>>>> MTTCG, presented on the wiki.
>>>>>
>>>>> == Needed patch for our work ==
>>>>>
>>>>> Some preliminaries are needed for our work:
>>>>>   * current_cpu doesn't make sense in mttcg so a tcg_executing flag is 
>>>>> added to
>>>>>     the CPUState.
>>>> Can't you just make it a TLS ?
>>> True that can be done as well. But the tcg_exec_flags has a second meaning 
>>> saying
>>> "you can't start executing code right now because I want to do a safe_work".
>>>>>   * We need to run some work safely when all VCPUs are outside their 
>>>>> execution
>>>>>     loop. This is done with the async_run_safe_work_on_cpu function 
>>>>> introduced
>>>>>     in this series.
>>>>>   * QemuSpin lock is introduced (on posix only yet) to allow a faster 
>>>>> handling of
>>>>>     atomic instruction.
>>>> How do you handle the memory model ? IE , ARM and PPC are OO while x86
>>>> is (mostly) in order, so emulating ARM/PPC on x86 is fine but emulating
>>>> x86 on ARM or PPC will lead to problems unless you generate memory
>>>> barriers with every load/store ..
>>> For the moment we are trying to do the first case.
>>>> At least on POWER7 and later on PPC we have the possibility of setting
>>>> the attribute "Strong Access Ordering" with mremap/mprotect (I dont'
>>>> remember which one) which gives us x86-like memory semantics...
>>>>
>>>> I don't know if ARM supports something similar. On the other hand, when
>>>> emulating ARM on PPC or vice-versa, we can probably get away with no
>>>> barriers.
>>>>
>>>> Do you expose some kind of guest memory model info to the TCG backend so
>>>> it can decide how to handle these things ?
>>>>
>>>>> == Code generation and cache ==
>>>>>
>>>>> As Qemu stands, there is no protection at all against two threads 
>>>>> attempting to
>>>>> generate code at the same time or modifying a TranslationBlock.
>>>>> The "protect TBContext with tb_lock" patch address the issue of code 
>>>>> generation
>>>>> and makes all the tb_* function thread safe (except tb_flush).
>>>>> This raised the question of one or multiple caches. We choosed to use one
>>>>> unified cache because it's easier as a first step and since the structure 
>>>>> of
>>>>> QEMU effectively has a ‘local’ cache per CPU in the form of the jump 
>>>>> cache, we
>>>>> don't see the benefit of having two pools of tbs.
>>>>>
>>>>> == Dirty tracking ==
>>>>>
>>>>> Protecting the IOs:
>>>>> To allows all VCPUs threads to run at the same time we need to drop the
>>>>> global_mutex as soon as possible. The io access need to take the mutex. 
>>>>> This is
>>>>> likely to change when 
>>>>> http://thread.gmane.org/gmane.comp.emulators.qemu/345258
>>>>> will be upstreamed.
>>>>>
>>>>> Invalidation of TranslationBlocks:
>>>>> We can have all VCPUs running during an invalidation. Each VCPU is able 
>>>>> to clean
>>>>> it's jump cache itself as it is in CPUState so that can be handled by a 
>>>>> simple
>>>>> call to async_run_on_cpu. However tb_invalidate also writes to the
>>>>> TranslationBlock which is shared as we have only one pool.
>>>>> Hence this part of invalidate requires all VCPUs to exit before it can be 
>>>>> done.
>>>>> Hence the async_run_safe_work_on_cpu is introduced to handle this case.
>>>> What about the host MMU emulation ? Is that multithreaded ? It has
>>>> potential issues when doing things like dirty bit updates into guest
>>>> memory, those need to be done atomically. Also TLB invalidations on ARM
>>>> and PPC are global, so they will need to invalidate the remote SW TLBs
>>>> as well.
>>>>
>>>> Do you have a mechanism to synchronize with another thread ? IE, make it
>>>> pop out of TCG if already in and prevent it from getting in ? That way
>>>> you can "remotely" invalidate its TLB...
>>> Yes that's what the safe_work is doing. Ask everybody to exit prevent VCPUs 
>>> to
>>> resume (tcg_exec_flag) and do the work when everybody is outside cpu-exec.
>>>
>>>>> == Atomic instruction ==
>>>>>
>>>>> For now only ARM on x64 is supported by using an cmpxchg instruction.
>>>>> Specifically the limitation of this approach is that it is harder to 
>>>>> support
>>>>> 64bit ARM on a host architecture that is multi-core, but only supports 32 
>>>>> bit
>>>>> cmpxchg (we believe this could be the case for some PPC cores).
>>>> Right, on the other hand 64-bit will do fine. But then x86 has 2-value
>>>> atomics nowadays, doesn't it ? And that will be hard to emulate on
>>>> anything. You might need to have some kind of global hashed lock list
>>>> used by atomics (hash the physical address) as a fallback if you don't
>>>> have a 1:1 match between host and guest capabilities.
>>> VOS did a "Slow path for atomic instruction translation" series you can 
>>> find here:
>>> https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg00971.html
>>>
>>> Which will be used in the end.
>>>
>>> Thanks,
>>> Fred
>>>> Cheers,
>>>> Ben.
> 


-- 
Claudio Fontana
Server Virtualization Architect
Huawei Technologies Duesseldorf GmbH
Riesstraße 25 - 80992 München

Re: [Qemu-devel] [RFC PATCH V7 00/19] Multithread TCG.

Reply via email to