On Fri, 15 Jul 2016 17:54:44 +0200 Thomas Huth <th...@redhat.com> wrote:
> On 15.07.2016 17:18, Greg Kurz wrote: > > On Fri, 15 Jul 2016 14:28:44 +0200 > > Thomas Huth <th...@redhat.com> wrote: > > > >> On 15.07.2016 10:35, David Gibson wrote: > >>> On Fri, Jul 15, 2016 at 10:10:25AM +0200, Thomas Huth wrote: > >>>> Commit 86b50f2e1bef ("Disable huge page support if it is not available > >>>> for main RAM") already made sure that huge page support is not announced > >>>> to the guest if the normal RAM of non-NUMA configurations is not backed > >>>> by a huge page filesystem. However, there is one more case that can go > >>>> wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured > >>>> with huge page support (and only the memory of a DIMM is configured with > >>>> it). When QEMU is started with the following command line for example, > >>>> the Linux guest currently crashes because it is trying to use huge pages > >>>> on a memory region that does not support huge pages: > >>>> > >>>> qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \ > >>>> > >>>> memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 > >>>> \ > >>>> -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \ > >>>> -numa node,nodeid=0 -numa node,nodeid=1 > >>>> > >>>> To fix this issue, we've got to make sure to disable huge page support, > >>>> too, when there is a NUMA node that is not using a memory backend with > >>>> huge page support. > >>>> > >>>> Fixes: 86b50f2e1befc33407bdfeb6f45f7b0d2439a740 > >>>> Signed-off-by: Thomas Huth <th...@redhat.com> > >>>> --- > >>>> target-ppc/kvm.c | 10 +++++++--- > >>>> 1 file changed, 7 insertions(+), 3 deletions(-) > >>>> > >>>> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c > >>>> index 884d564..7a8f555 100644 > >>>> --- a/target-ppc/kvm.c > >>>> +++ b/target-ppc/kvm.c > >>>> @@ -389,12 +389,16 @@ static long getrampagesize(void) > >>>> > >>>> object_child_foreach(memdev_root, find_max_supported_pagesize, > >>>> &hpsize); > >>>> > >>>> - if (hpsize == LONG_MAX) { > >>>> + if (hpsize == LONG_MAX || hpsize == getpagesize()) { > >>>> return getpagesize(); > >>>> } > >>>> > >>>> - if (nb_numa_nodes == 0 && hpsize > getpagesize()) { > >>>> - /* No NUMA nodes and normal RAM without -mem-path ==> no huge > >>>> pages! */ > >>>> + /* If NUMA is disabled or the NUMA nodes are not backed with a > >>>> + * memory-backend, then there is at least one node using "normal" > >>>> + * RAM. And since normal RAM has not been configured with > >>>> "-mem-path" > >>>> + * (what we've checked earlier here already), we can not use huge > >>>> pages! > >>>> + */ > >>>> + if (nb_numa_nodes == 0 || numa_info[0].node_memdev == NULL) { > >>> > >>> Is that second clause sufficient, or do you need to loop through and > >>> check the memdev of every node? > >> > >> Checking the first entry should be sufficient. QEMU forces you to > >> specify either a memory backend for all NUMA nodes (which we should have > >> looked at during the object_child_foreach() some lines earlier), or you > >> must not specify a memory backend for any NUMA node at all. You can not > >> mix the settings, so checking numa_info[0] is enough. > > > > And what happens if we specify a hugepage memdev backend to one of the > > nodes and a regular RAM memdev backend to the other ? > > I think that should be handled with the object_child_foreach() logic in > that function ... unless I completely misunderstood the code ;-) > You're right. The loop always catches the smallest page size. :) So this patch indeed fixes the case you describe in the changelog. Reviewed-by: Greg Kurz <gr...@kaod.org> Tested-by: Greg Kurz <gr...@kaod.org> > > I actually wanted to try that but I hit an assertion, which isn't > > related to this patch I think: > > > > qemu-system-ppc64: memory.c:1934: memory_region_add_subregion_common: > > Assertion `!subregion->container' failed. > > I just tried that, too, and I did not get that assertion: > I tried with the master branch (commit 14c7d99333e4) + your patch... I'll investigate that. > qemu-system-ppc64 -enable-kvm ... -m 2G,slots=4,maxmem=32G \ > -object > memory-backend-file,policy=default,mem-path=/mnt/kvm_hugepage,size=1G,id=mem-mem1 > \ > -object memory-backend-file,policy=default,mem-path=/mnt,size=1G,id=mem-mem2 > \ > -smp 2 -numa node,nodeid=0,memdev=mem-mem1 \ > -numa node,nodeid=1,memdev=mem-mem2 > > And the guest was starting fine, with huge pages disabled. > > > So I tried to trick the logic you are trying to fix the other way > > round: > > > > -mem-path /dev/hugepages \ > > -m 1G,slots=4,maxmem=32G \ > > -object memory-backend-ram,policy=default,size=1G,id=mem-mem1 \ > > -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 \ > > -smp 2 \ > > -numa node,nodeid=0 -numa node,nodeid=1 > > > > The guest fails the same way as before your patch: the hugepage size is > > advertised to the guest, but the numa node is associated to regular ram. > > You're right, this is still an issue here! ... so we need yet another > fix for this case :-/ > Maybe check the memory backend objects first if we have some, else return gethugepagesize() if we have mem-path, else return getpagesize() ? > Thanks for the testing! > > Thomas > > You're welcome. -- Greg
pgpVNqPPh0Nsb.pgp
Description: OpenPGP digital signature