On 9/6/23 17:16, Peter Xu wrote:

Just a note..

Probably fine for now to reuse block page size, but IIUC the right thing to
do is to fetch it from the signal info (in QEMU's sigbus_handler()) of
kernel_siginfo.si_addr_lsb.

At least for x86 I think that stores the "shift" of covered poisoned page
(one needs to track the Linux handling of VM_FAULT_HWPOISON_LARGE for a
huge page, though.. not aware of any man page for that).  It'll then work
naturally when Linux huge pages will start to support sub-huge-page-size
poisoning someday.  We can definitely leave that for later.


I totally agree with that !


--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1145,7 +1145,8 @@ static int save_zero_page_to_file(PageSearchStatus *pss, 
QEMUFile *file,
      uint8_t *p = block->host + offset;
      int len = 0;
- if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
+    if ((kvm_enabled() && kvm_hwpoisoned_page(block, (void *)offset)) ||

Can we move this out of zero page handling?  Zero detection is not
guaranteed to always be the 1st thing to do when processing a guest page.
Currently it'll already skip either rdma or when compression enabled, so
it'll keep crashing there.

Perhaps at the entry of ram_save_target_page_legacy()?

Right, as expected, using migration compression with poisoned pages crashes even with this fix...

The difficulty I see to place the poisoned page verification on the
entry of ram_save_target_page_legacy() is what to do to skip the found poison page(s) if any ?

Should I continue to treat them as zero pages written with save_zero_page_to_file ? Or should I consider the case of an ongoing compression use and create a new code compressing an empty page with save_compress_page() ?

And what about an RDMA memory region impacted by a memory error ?
This is an important aspect.
Does anyone know how this situation is dealt with ? And how it should be handled in Qemu ?

--
Thanks,
William.

Reply via email to