Hi,

After reading related source code and discussions in the mailing list,
I understand that:

1. tb_page_add() calls tlb_protect_code() to clear the code page by
setting TLB_NOTDIRTY in .addr_write field of corresponding CPUTLBEntry
*of all vCPUs*.

2. Updating and accessing (even from TCG-generated code) of .addr_write
is atomic and therefore does NOT result in any undefined behavior.

3. .addr_write field with TLB_NOTDIRTY forces qemu_st to execute the
so-called "slow path", in which TBs in the modified portion of the
code page is invalidated, so the modified code will be re-translated.

However, similar to the situation described in:
https://lists.nongnu.org/archive/html/qemu-devel/2018-02/msg02529.html

When we have 2 vCPUs with one of them writing to the code page while
the other just translated some code within that same page, the following
situation might happen:

   vCPU thread 1 - writing      vCPU thread 2 - translating
   -----------------------      -----------------------
   TLB check -> slow path
     notdirty_write()
       set dirty flag
     write to RAM
                                tb_gen_code()
                                  tb_page_add()
                                    tlb_protect_code()

   TLB check -> fast path
                                      set TLB_NOTDIRTY
     write to RAM
executing unmodified code for this time
                                and maybe also for the next time, never
                                re-translate modified TBs.


My question is:
  Should the situation described above be considered as a bug or,
  an intended behavior for QEMU (, so it's the programmer's fault
  for not flushing the icache after modifying shared code page)?

Looking forward for your reply, and thanks in advance!

--
Liren Wei





Reply via email to