Hi Aubrey, I'm actually wondering if this is a new regression bug INTRODUCED in Kilo (as part of the NUMA work). I'll be testing that a bit too by altering my Juno architecture a bit (monkeying with kernel MAXCPUS to see if I can get into a similar situation in Juno but with identical hardware.)
The best info I have found so far is Daniel's howto (in the openstack docs) for creating a test scenario for numa: http://docs.openstack.org/developer/nova/devref/testing/libvirt-numa.html (and related pages) On Fri, Jul 17, 2015 at 7:10 AM, Aubrey Wells <[email protected]> wrote: > I ran into the different core count thing a while back too and its not > fixed in Kilo (that's where I discovered it). I posted to the mailing list > and didn't get any feedback on it, but as I was just looking in the > archives to send you the link to the hack I found to fix it, I noticed that > it silently failed to post to the mailing list. I'll add the text of my > email below, maybe someone will have some ideas. Original message follows. > > ======= > > Greetings, > Trying to decide if this is a bug or just a config option that I can't > find. The setup I'm currently testing in my lab with is two compute nodes > running Kilo, one has 40 cores (2x 10c with HT) and one has 16 cores (2x 4c > + HT). I don't have any CPU pinning enabled in my nova config, which seems > to have the effect of setting in libvirt.xml a vcpu cpuset element like (if > created on the 40c node): > > <vcpu > cpuset="1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39">1</vcpu> > > And then if I migrate that instance to the 16c node, it will bomb out with > an exception: > > Live Migration failure: Invalid value > '0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38' for 'cpuset.cpus': > Invalid argument > > Which makes sense, since that node doesn't have any vcpus after 15 (0-15). > > I can fix the symptom by commenting out a line in > nova/virt/libvirt/config.py (circa line 1831) so it always has an empty > cpuset and thus doesn't write that line to libvirt.xml: > # vcpu.set("cpuset", hardware.format_cpu_spec(self.cpuset)) > > And the instance will happily migrate to the host with less CPUs, but this > loses some of the benefit of openstack trying to evenly spread out the > core usage on the host, at least that's what I think the purpose of that > is. > > I'd rather fix it the right way if there's a config option I don't see or > file a bug if its a bug. > > What I think should be happening is that when it creates the libvirt > definition on the destination compute node, it write out the correct cpuset > per the specs of the hardware its going on to. > > If it matters, in my nova-compute.conf file, I also have cpu mode and > model defined to allow me to migrate between the two different > architectures to begin with (the 40c is Sandybridge and the 16c is Westmere > so I set it to the lowest common denominator of Westmere): > > cpu_mode=custom > cpu_model=Westmere > > Any help is appreciated. > > > > On Fri, Jul 17, 2015 at 8:58 AM, David Medberry <[email protected]> > wrote: > >> HI Daniel, >> >> Yep found that all out. >> >> Now I'm struggling through the NUMA mismatch. NUMA as there are two cpus. >> The old CPU was a 10 core 20 thread thus 40 "cpus", {0-9,20-29} and then >> {10-19,30-39} on the other cell. The new CPU is a 12 core 24 thread. >> Apparently even in kilo, this results in a mismatch if I'm running a 2 VCPU >> guest and trying to migrate from new to old. I suspect I have to disable >> NUMA somehow (filter, etc) but it is entirely non-obvious. And of course >> I'm doing this again in OpenStack nova (not direct libvirt) so I'm going to >> do a bit more research and then file a new bug. This also may be fixed in >> Kilo but I"m not finding it (and it may be fixed in Liberty already and >> just need a backport.) >> >> My apologies for not following up to the list once I found the Kilo >> solution to the original problem. >> >> On Fri, Jul 17, 2015 at 6:10 AM, Daniel P. Berrange <[email protected]> >> wrote: >> >>> On Fri, Jul 17, 2015 at 01:07:56PM +0100, Daniel P. Berrange wrote: >>> > On Thu, Jul 09, 2015 at 12:00:15PM -0600, David Medberry wrote: >>> > > Hi, >>> > > >>> > > When trying to live-migrate between two distinct CPUs, I kind of >>> expect >>> > > there to be issues. Which is why openstack supports the >>> "cpu_mode=custom", >>> > > "cpu_model=MODELNAME" flags for libvirt. >>> > > >>> > > When I set those to some Lowest Common Denominator (and restart >>> > > everything), I still git the issue. I've set both systems to >>> SandyBridge >>> > > and tested as well as Conroe. The actual CPUs are Ivy Bridge and >>> Haswell >>> > > (newer than SandyBridge and supersets thereof.) >>> > > >>> > > The Older->Newer migration works fine (even without setting a >>> cpu_model) >>> > > but the newer to older never works. >>> > > >>> > > Specfics: >>> > > OpenStack Juno.2 >>> > > LibVirt: 1.2.2 >>> > > >>> > > Older: model name : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (Ivy >>> Bridge) >>> > > Newer: model name : Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz >>> (Haswell) >>> > > >>> > > Daniel, Operators: Any ideas? >>> > >>> > In versions of Nova prior to Liberty, nova did an incorrect CPU model >>> > comparison. It checks the source *host* CPU model against the dest >>> > host CPU model, instead of checking the *guest* CPU model against the >>> > dest host CPU model. >>> > >>> > This is fixed in Liberty, provided you have the cpu_mode=custom and >>> > cpu_modelk=MODELNAME parameters set. Unfortunately the fix will only >>> > work for guests that are launched under Liberty codebase as it needed >>> > a database addition. So if you have existing running guests from Juno >>> > those need restarting after upgrade. >>> >>> Sigh, s/Liberty/Kilo/ in everything I wrote here >>> >>> Regards, >>> Daniel >>> -- >>> |: http://berrange.com -o- >>> http://www.flickr.com/photos/dberrange/ :| >>> |: http://libvirt.org -o- >>> http://virt-manager.org :| >>> |: http://autobuild.org -o- >>> http://search.cpan.org/~danberr/ :| >>> |: http://entangle-photo.org -o- >>> http://live.gnome.org/gtk-vnc :| >>> >> >> >> _______________________________________________ >> OpenStack-operators mailing list >> [email protected] >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> >> >
_______________________________________________ OpenStack-operators mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
