On 10/06/16 19:15, Alex Bennée wrote: > Sergey Fedorov <serge.f...@gmail.com> writes: > >> On 26/05/16 19:35, Alvise Rigo wrote: >>> Using tcg_exclusive_{lock,unlock}(), make the emulation of >>> LoadLink/StoreConditional thread safe. >>> >>> During an LL access, this lock protects the load access itself, the >>> update of the exclusive history and the update of the VCPU's protected >>> range. In a SC access, the lock protects the store access itself, the >>> possible reset of other VCPUs' protected range and the reset of the >>> exclusive context of calling VCPU. >>> >>> The lock is also taken when a normal store happens to access an >>> exclusive page to reset other VCPUs' protected range in case of >>> collision. >> I think the key problem here is that the load in LL helper can race with >> a concurrent regular fast-path store. (snip) > I think this can be fixed using async_safe_run_on_cpu and tweaking the > ldlink helper. > > * Change the helper_ldlink call > - pass it offset-of(cpu->reg[n]) so it can store result of load > - maybe pass it next-pc (unless there is some other way to know) > > vCPU runs until the ldlink instruction occurs and jumps to the helper > > * Once in the helper_ldlink > - queue up an async helper function with info of offset > - cpu_loop_exit_restore(with next PC) > > vCPU the issued the ldlink exits immediately, waits until all vCPUs are > out of generated code. > > * Once in helper_ldlink async helper > - Everything at this point is quiescent, no vCPU activity > - Flush all TLBs/set flags > - Do the load from memory, store directly into cpu->reg[n] > > The key thing is once we are committed to load in the async helper > nothing else can get in the way. Any stores before we are in the helper > happen as normal, once we exit the async helper all potential > conflicting stores will slow path. > > There is a little messing about in knowing the next PC which is simple > in the ARM case but gets a bit more complicated for architectures that > have deferred jump slots. I haven't looked into this nit yet.
Hmm, this looks pretty similar to what linux-user code actually does, e.g. in do_strex(). Just restarting the LL instruction as Alvise suggests may well be an easier approach (or may not). Kind regards, Sergey