On 02/05/2015 11:23 AM, Daniel P. Berrange wrote:
On Thu, Feb 05, 2015 at 11:21:05AM -0600, Chris Friesen wrote:
On 02/05/2015 10:32 AM, Daniel P. Berrange wrote:
On Thu, Feb 05, 2015 at 10:28:56AM -0600, Chris Friesen wrote:

For what it's worth, I was able to make hugepages work with an older qemu by
commenting out two lines in
virt.libvirt.config.LibvirtConfigGuestMemoryBacking.format_dom()

     def format_dom(self):
         root = super(LibvirtConfigGuestMemoryBacking, self).format_dom()

         if self.hugepages:
             hugepages = etree.Element("hugepages")
             #for item in self.hugepages:
             #    hugepages.append(item.format_dom())
             root.append(hugepages)


This results in XML that looks like:

   <memoryBacking>
     <hugepages/>
   </memoryBacking>


And a qemu commandline that looks like

-mem-prealloc -mem-path /mnt/huge-2048kB/libvirt/qemu

With that there is no guarantee that the huge pages are being allocated
>from the NUMA node on which the guest is actually placed by Nova, hence
we did not intend to support that.

It's possible that the end-user didn't indicate a preference for NUMA.  If
they just asked for hugepages and we have the ability to give it to them I
think we should do so.

In the likely common case of an instance with a single NUMA node, I think
this will likely give the desired behaviour since the default kernel
behaviour is to prefer allocating from the numa node that requested the
memory.  As long as qemu affinity is set before it allocates memory we
should be okay.

The only case that isn't covered is if the flavor specifies multiple numa
nodes.  In that case maybe the scheduler filters should be aware of that and
refuse to assign an instance with multiple numa nodes to a compute node with
an older qemu.

Having the scheduler need to care about versions of software installed on
nodes is a whole heap of extra complexity for no credible gain. It is
perfectly reasonable to just mandate the newer QEMU for this IMHO and
avoid that complexity in Nova.

Okay, then just let it fail on that compute node and the scheduler will retry somewhere else. My point is that it's silly to require a very recent qemu version just to enable hugepages, when the common case of a single numa node VM will likely still work just fine with a much older qemu.

Also, in either case shouldn't we have a check in the code against MIN_QEMU_NUMA_PIN_VERSION or something like that? As it stands, there is no information anywhere in the codebase or requirements.txt that specifies that you need qemu 2.1 or later if you want hugepages or numa pinning support. This could end up being confusing for people that try to get it working--I know it took a while for me to track it down.

Chris

_______________________________________________
Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
Post to     : openstack@lists.openstack.org
Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack

Reply via email to