On Tue, Feb 11, 2025 at 09:27:04PM +0000, “William Roche wrote: > From: William Roche <william.ro...@oracle.com> > > Here is a very simplified version of my fix only dealing with the > recovery of huge pages on VM reset. > --- > This set of patches fixes an existing bug with hardware memory errors > impacting hugetlbfs memory backed VMs and its recovery on VM reset. > When using hugetlbfs large pages, any large page location being impacted > by an HW memory error results in poisoning the entire page, suddenly > making a large chunk of the VM memory unusable. > > The main problem that currently exists in Qemu is the lack of backend > file repair before resetting the VM memory, resulting in the impacted > memory to be silently unusable even after a VM reboot. > > In order to fix this issue, we take into account the page size of the > impacted memory block when dealing with the associated poisoned page > location. > > Using the page size information we also try to regenerate the memory > calling ram_block_discard_range() on VM reset when running > qemu_ram_remap(). So that a poisoned memory backed by a hugetlbfs > file is regenerated with a hole punched in this file. A new page is > loaded when the location is first touched. In case of a discard > failure we fall back to remapping the memory location. > > But we currently don't reset the memory settings and the 'prealloc' > attribute is ignored after the remap from the file backend.
queued patch 1-2, thanks. -- Peter Xu