On 9/6/23 17:16, Peter Xu wrote:
Just a note..
Probably fine for now to reuse block page size, but IIUC the right thing to
do is to fetch it from the signal info (in QEMU's sigbus_handler()) of
kernel_siginfo.si_addr_lsb.
At least for x86 I think that stores the "shift" of covered poisoned page
(one needs to track the Linux handling of VM_FAULT_HWPOISON_LARGE for a
huge page, though.. not aware of any man page for that). It'll then work
naturally when Linux huge pages will start to support sub-huge-page-size
poisoning someday. We can definitely leave that for later.
I totally agree with that !
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1145,7 +1145,8 @@ static int save_zero_page_to_file(PageSearchStatus *pss,
QEMUFile *file,
uint8_t *p = block->host + offset;
int len = 0;
- if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
+ if ((kvm_enabled() && kvm_hwpoisoned_page(block, (void *)offset)) ||
Can we move this out of zero page handling? Zero detection is not
guaranteed to always be the 1st thing to do when processing a guest page.
Currently it'll already skip either rdma or when compression enabled, so
it'll keep crashing there.
Perhaps at the entry of ram_save_target_page_legacy()?
Right, as expected, using migration compression with poisoned pages
crashes even with this fix...
The difficulty I see to place the poisoned page verification on the
entry of ram_save_target_page_legacy() is what to do to skip the found
poison page(s) if any ?
Should I continue to treat them as zero pages written with
save_zero_page_to_file ? Or should I consider the case of an ongoing
compression use and create a new code compressing an empty page with
save_compress_page() ?
And what about an RDMA memory region impacted by a memory error ?
This is an important aspect.
Does anyone know how this situation is dealt with ? And how it should be
handled in Qemu ?
--
Thanks,
William.