On 04/19/2018 06:08 PM, Greg Kurz wrote: > On Thu, 19 Apr 2018 16:11:37 +0200 > David Hildenbrand <da...@redhat.com> wrote: > >> On 19.04.2018 15:34, Christian Borntraeger wrote: >>> >>> >>> On 04/19/2018 02:58 PM, Cornelia Huck wrote: >>>> On Thu, 19 Apr 2018 14:33:18 +0200 >>>> Igor Mammedov <imamm...@redhat.com> wrote: >>>> >>>>> On Thu, 19 Apr 2018 17:21:23 +1000 >>>>> David Gibson <da...@gibson.dropbear.id.au> wrote: >>>>> >>>>>> If the -mem-path option is set, we attempt to map the guest's RAM from a >>>>>> file in the given path; it's usually used to back guest RAM with >>>>>> hugepages. >>>>>> If we're unable to (e.g. not enough free hugepages) then we fall back to >>>>>> allocating normal anonymous pages. This behaviour can be surprising, >>>>>> but a >>>>>> comment in allocate_system_memory_nonnuma() suggests it's legacy >>>>>> behaviour >>>>>> we can't change. >>>>>> >>>>>> What really isn't ok, though, is that in this case we leave mem_path set. >>>>>> That means functions which attempt to determine the pagesize of main RAM >>>>>> can erroneously think it is hugepage based on the requested path, even >>>>>> though it's not. >>>>>> >>>>>> This is particular bad for the pseries machine type. KVM HV limitations >>>>>> mean the guest can't use pagesizes larger than the host page size used to >>>>>> back RAM. That means that such a fallback, rather than merely giving >>>>>> poorer performance that expected will cause the guest to freeze up early >>>>>> in >>>>>> boot as it attempts to use large page mappings that can't work. >>>>>> >>>>>> This patch addresses the problem by clearing the mem_path variable when >>>>>> we >>>>>> fall back to anonymous pages, meaning that subsequent attempts to >>>>>> determine the RAM page size will get an accurate result. >>>>>> >>>>>> Signed-off-by: David Gibson <da...@gibson.dropbear.id.au> >>>>>> --- >>>>>> numa.c | 1 + >>>>>> 1 file changed, 1 insertion(+) >>>>>> >>>>>> Paolo et al, as with my earlier patches adding some extensions to the >>>>>> helpers for determining backing page sizes, if there are no objections >>>>>> can I get an ack to merge this via my ppc tree? >>>>>> >>>>>> diff --git a/numa.c b/numa.c >>>>>> index 1116c90af9..78a869e598 100644 >>>>>> --- a/numa.c >>>>>> +++ b/numa.c >>>>>> @@ -469,6 +469,7 @@ static void >>>>>> allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner, >>>>>> /* Legacy behavior: if allocation failed, fall back to >>>>>> * regular RAM allocation. >>>>>> */ >>>>>> + mem_path = NULL; >>>>>> memory_region_init_ram_nomigrate(mr, owner, name, ram_size, >>>>>> &error_fatal); >>>>>> } >>>>>> #else >>>>> >>>>> mem_path is also used by kvm_s390_apply_cpu_model(), >>>>> and in ccw_init() memory is initialized before CPUs are > > Something similar happens with spapr: kvm_fixup_page_sizes() calls > qemu_getrampagesize() during CPU start, which happens before the machine > init calls allocate_system_memory_nonnuma(). Shouldn't we allocate memory > before calling spapr_init_cpus() in spapr_machine_init() then ? > >>>>> so if QEM was started with -mem-path, then before patch >>>>> created CPU won't have CMM enabled and print warning: >>>>> >>>>> "CMM will not be enabled because it is not compatible with hugetlbfs." >>>>> >>>>> and after patch it might enable CMM if we clear mem_path. >>>>> So question is do we care about this? >>>> >>>> I don't quite remember the cmm semantics here -- Christian? >>> >>> The CMMA interface does not work on large pages. I think the kernel will >>> react >>> with EFAULT in some cases (cmma migration and others) so qemu will probably >>> fail >>> unexpectedly. >>> >>> But this patch seems to only clear mem-path if we do not allocate at all >>> from >>> hugetlbfs. So things should be ok, no? >>> >>> >> >> This even looks like the right thing to me, as hugetlbfs was never >> supported. >> > > Unrelated to this patch, -mem-path can be passed something that doesn't sit > in a hugetlbfs, in which case we use getpagesize()... is there a reason for > kvm_s390_enable_cmma() to filter out this case as well ? Or should we rather > check mem_path isn't NULL and points to a hugetlbfs ?
cmm is somewhat special, so I prefer to have it only for non-mem-path memory since I know that it works for anonymous pages. I would rather white list other mechanisms if necessary in the future. backings