Hi, I have reports from a user who is experiencing intermittent issues with qemu being unable to allocate memory for the guest HPT. We see:
libvirtError: internal error: process exited while connecting to monitor: Unexpected error in spapr_alloc_htab() at /build/qemu-UwnbKa/qemu-2.5+dfsg/hw/ppc/spapr.c:1030: qemu-system-ppc64le: Failed to allocate HTAB of requested size, try with smaller maxmem and in the kernel logs: [10103945.040498] alloc_contig_range: 19127 callbacks suppressed [10103945.040502] alloc_contig_range: [7a5d00, 7a6500) PFNs busy [10103945.040526] alloc_contig_range: [7a5d00, 7a6504) PFNs busy [10103945.040548] alloc_contig_range: [7a5d00, 7a6508) PFNs busy [10103945.040569] alloc_contig_range: [7a5d00, 7a650c) PFNs busy [10103945.040591] alloc_contig_range: [7a5d00, 7a6510) PFNs busy [10103945.040612] alloc_contig_range: [7a5d00, 7a6514) PFNs busy [10103945.040634] alloc_contig_range: [7a5d00, 7a6518) PFNs busy [10103945.040655] alloc_contig_range: [7a5d00, 7a651c) PFNs busy [10103945.040676] alloc_contig_range: [7a5d00, 7a6520) PFNs busy [10103945.040698] alloc_contig_range: [7a5d00, 7a6524) PFNs busy I understand that this is caused when the request for an appropriately sized and aligned piece of contiguous host memory for the guest hash page table cannot be satisfied from the CMA. The user was attempting to start a 16GB guest, so if I can read qemu code correctly, it would be asking for 128MB of contiguous memory. The CMA is pretty large - this is taken from /proc/meminfo some time after the allocation failure: CmaTotal: 26853376 kB CmaFree: 4024448 kB (The CMA is ~25GB, the host has 512GB of RAM.) My guess is that the CMA has become fragmented (the machine had 112 days of uptime) and that was interfering with the ability of the kernel to service the request? Some googling suggests that these sorts of failures have been seen before: * [1] is a Launchpad bug mirrored from the IBM Bugzilla that talks about this issue especially in the context of PCI passthrough leading to more memory being pinned. No PCI passthrough is occurring in this case. * [2] is from Red Hat - it seems to be especially focussed on particularly huge guests and memory hotplug. I don't think either of those apply here either. I noticed from [1] that there is a patch from Balbir that apparently helps when VFIO is used - 2e5bbb5461f1 ("KVM: PPC: Book3S HV: Migrate pinned pages out of CMA"). The user is running a 4.4 kernel with this backported. There's also reference to some work Alexey was doing to unpin pages in a more timely fashion. It looks like that stalled, and I can't see anything else particularly relevant in the kernel tree between then and now - although I may well be missing stuff. So: - have I missed anything obvious here/have I gone completely wrong in my analysis somewhere? - have I missed any great changes since 4.4 that would fix this? - is there any ongoing work in increasing CMA availability? - I noticed in arch/powerpc/kvm/book3s_hv_builtin.c, kvm_cma_resv_ratio is defined as a boot parameter. By default 5% of host memory is reserved for CMA. Presumably increasing this will increase the likelihood that the kernel can service a request for contiguous memory. Are there any recommended tunings here? - is there anything else the user could try? Thanks! Regards, Daniel [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1632045 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1304300