Hello, As discussed on IRC, here is the tentative fix for concurrent code patching. It helps with the x86_64 .NET app on s390x and survives check-tcg.
Bug report: https://lists.nongnu.org/archive/html/qemu-devel/2021-08/msg00644.html IRC log: ======== <stsquad> iii: my initial thoughts are there is a race in tb_page_add because while we will have flushed all the old pages this new corrupted page gets added the new corrupted one gets in <iii> stsquad: I think you are right that it can be considered a tb_page_add race. Would it be reasonable to solve it by marking the page read-only before translation and then making sure that it doesn't get its PAGE_WRITE back until translation is complete? <rth> iii, stsquad: https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg07995.html <rth> iii: yes, making the page read-only early is the fix, i think. i believe we already hold the mmap_lock around translation, so that should make a writer fault and then wait on the mmap_lock. <iii> rth: Thanks, let me give it a try. I'll post whatever I come up with as an RFC patch to qemu-devel. <rth> thanks <stsquad> rth: doesn't that serialise all translation again? <stsquad> rth: we could page lock instead? <rth> stsquad: i thought we were talking about user-only, where translation is always serial. <rth> stsquad: the link from january is a system-mode bug of the same kind, where, yes, we need to hold the page lock or something. <stsquad> rth: ahh yes because we don't have zoned translation caches... Ilya Leoshkevich (1): accel/tcg: Clear PAGE_WRITE before translation accel/tcg/translate-all.c | 59 +++++++++++++++++++++--------------- accel/tcg/translator.c | 26 ++++++++++++++-- include/exec/translate-all.h | 1 + 3 files changed, 59 insertions(+), 27 deletions(-) -- 2.31.1