On Mon, Dec 02, 2013 at 11:05:02PM -0800, Vui Chiap Lam wrote: > Hi Daniel, > > I too found the original bp a little hard to follow, so thanks for > writing up the wiki! I see that the wiki is now linked to the BP, > which is great as well. > > The ability to express CPU topology constraints for the guests > has real-world use, and several drivers, including VMware, can definitely > benefit from it. > > If I understand correctly, in addition to being an elaboration of the > BP text, the wiki also adds the following: > > 1. Instead of returning the besting matching (num_sockets (S), > cores_per_socket (C), threads_per_core (T)) tuple, all applicable > (S,C,T) tuples are returned, sorted by S then C then T. > 2. A mandatory topology can be provided in the topology computation. > > I like 2. because there are multiple reasons why all of a hypervisor's > CPU resources cannot be allocated to a single virtual machine. > Given that the mandatory (I prefer maximal) topology is probably fixed > per hypervisor, I wonder this information should also be used in > scheduling time to eliminate incompatible hosts outright.
The host is exposing info about vCPU count it is able to support and the scheduler picks on that basis. The guest image is just declaring upper limits on topology it can support. So If the host is able to support the guest's vCPU count, then the CPU topology decision should never cause any boot failure As such CPU topology has no bearing on scheduling, which is good because I think it would significantly complicate the problem. > As for 1. because of the order of precendence of the fields in the > (S,C,T) tuple, I am not sure how the preferred_topology comes into > play. Is it meant to help favor alternative values of S? > Also it might be good to describe a case where returning a list of > (S,C,T) instead of best-match is necessary. It seems deciding what to > pick other that the first item in the list requires logic similar to > that used to arrive at the list in the first place. It is really all about considering NUMA implications. If you prefer cores and your VM ram cross a NUMA node then you sacrifice performance. So if you know the VM RAM will have to cross a NUMA node, then you may set a lower cores limit to force returning of topology spanning multiple sockets. By returning a list of acceptable topologies the virt driver can then have some flexibility in deciding how to pin guest CPUs / RAM to host NUMA nodes, and/or expose guest visible NUMA topology eg if the returned list gives a choice of (2 sockets, 2 cores, 1 thread) (1 socket, 4 cores, 1 thread) then the virt driver can now chose whether to place the guest inside 1 single NUMA node, or spread it across nodes, and still expose sane NUMA topology info to the guest. You could say we should take account of NUMA straight away at the time we figure out the CPU topology, but I believe that would complicate this code and make it impractical to share the code across drivers. If a virt driver doesn't care todo anything with the list of possible topologies though, it can simply ignore it and always take the first element in the list. This is what we'lll do in libvirt initially, but we want todo intelligent automatic NUMA placement later to improve the performance utilization of hosts. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev