On 27/11/2017 15:47, Peter Maydell wrote: > On 27 November 2017 at 15:38, Paolo Bonzini <pbonz...@redhat.com> wrote: >> On 24/11/2017 18:18, Peter Maydell wrote: >>> * threads A & B both try to do a write to a page with code in it at >>> the same time (ie which we've made non-writeable, so SEGV) >>> * they race into the signal handler with this faulting address >>> * thread A happens to get to page_unprotect() first and takes the >>> mmap lock, so thread B sits waiting for it to be done >>> * A then finds the page, marks it PAGE_WRITE and mprotect()s it writable >>> * A can then continue OK (returns from signal handler to retry the >>> memory access) >>> * ...but when B gets the mmap lock it finds that the page is already >>> PAGE_WRITE, and so it exits page_unprotect() via the "not due to >>> protected translation" code path, and wrongly delivers the signal >>> to the guest rather than just retrying the access >>> >>> I'm not sure how best to fix this. We could make page_unprotect() >>> say "if PAGE_WRITE is set, assume this call raced with another one >>> and say 'this was caused by protected translation' without doing >>> anything". >> >> Yes, I think this is the only solution since SIGSEGV is raised >> asynchronously. Even using a trylock would only narrow the race window >> but not fix it. > > I have a patch from rth based on an idea he and I came up with: > we add a field to the PageDesc struct to store the thread id of > the thread that last touches the flags. If you come into the > segv handler and the page flags/last-modified-by field say "should be > writeable and somebody else updated it" then you mark the page as > "last modified by this thread" and retry the access. If the > flags say "should be writeable, last modified by this thread" > then you know the page state hasn't changed since this thread > last saw it as "definitely not causing segvs because of cached TBs", > and so that should be passed on as a guest SEGV.
Clever, but why would si_code not work?... Paolo