Hello,

As discussed on IRC, here is the tentative fix for concurrent code
patching. It helps with the x86_64 .NET app on s390x and survives
check-tcg.

Bug report: 
https://lists.nongnu.org/archive/html/qemu-devel/2021-08/msg00644.html

IRC log:
========
<stsquad> iii: my initial thoughts are there is a race in tb_page_add because 
while we will have flushed all the old pages this new corrupted page gets added 
the new corrupted one gets in
<iii> stsquad: I think you are right that it can be considered a tb_page_add 
race. Would it be reasonable to solve it by marking the page read-only before 
translation and then making sure that it doesn't get its PAGE_WRITE back until 
translation is complete?
<rth> iii, stsquad: 
https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg07995.html
<rth> iii: yes, making the page read-only early is the fix, i think.  i believe 
we already hold the mmap_lock around translation, so that should make a writer 
fault and then wait on the mmap_lock.
<iii> rth: Thanks, let me give it a try. I'll post whatever I come up with as 
an RFC patch to qemu-devel.
<rth> thanks
<stsquad> rth: doesn't that serialise all translation again?
<stsquad> rth: we could page lock instead?
<rth> stsquad: i thought we were talking about user-only, where translation is 
always serial.
<rth> stsquad: the link from january is a system-mode bug of the same kind, 
where, yes, we need to hold the page lock or something.
<stsquad> rth: ahh yes because we don't have zoned translation caches...

Ilya Leoshkevich (1):
  accel/tcg: Clear PAGE_WRITE before translation

 accel/tcg/translate-all.c    | 59 +++++++++++++++++++++---------------
 accel/tcg/translator.c       | 26 ++++++++++++++--
 include/exec/translate-all.h |  1 +
 3 files changed, 59 insertions(+), 27 deletions(-)

-- 
2.31.1


Reply via email to