On 09/20/2016 11:20 AM, Daniel P. Berrange wrote: > On Tue, Sep 20, 2016 at 11:01:23AM -0400, Sean Dague wrote: >> On 09/20/2016 10:38 AM, Daniel P. Berrange wrote: >>> On Tue, Sep 20, 2016 at 09:20:15AM -0400, Sean Dague wrote: >>>> This is a bit delayed due to the release rush, finally getting back to >>>> writing up my experiences at the Ops Meetup. >>>> >>>> Nova Feedback Session >>>> ===================== >>>> >>>> We had a double session for Feedback for Nova from Operators, raw >>>> etherpad here - https://etherpad.openstack.org/p/NYC-ops-Nova. >>>> >>>> The median release people were on in the room was Kilo. Some were >>>> upgrading to Liberty, many had older than Kilo clouds. Remembering >>>> these are the larger ops environments that are engaged enough with the >>>> community to send people to the Ops Meetup. >>>> >>>> >>>> Performance Bottlenecks >>>> ----------------------- >>>> >>>> * scheduling issues with Ironic - (this is a bug we got through during >>>> the week after the session) >>>> * live snapshots actually end up performance issue for people >>>> >>>> The workarounds config group was not well known, and everyone in the >>>> room wished we advertised that a bit more. The solution for snapshot >>>> performance is in there >>>> >>>> There were also general questions about what scale cells should be >>>> considered at. >>>> >>>> ACTION: we should make sure workarounds are advertised better >>> >>> Workarounds ought to be something that admins are rarely, if >>> ever, having to deal with. >>> >>> If the lack of live snapshot is such a major performance problem >>> for ops, this tends to suggest that our default behaviour is wrong, >>> rather than a need to publicise that operators should set this >>> workaround. >>> >>> eg, instead of optimizing for the case of a broken live snapshot >>> support by default, we should optimize for the case of working >>> live snapshot by default. The broken live snapshot stuff was so >>> rare that no one has ever reproduced it outside of the gate >>> AFAIK. >>> >>> IOW, rather than hardcoding disable_live_snapshot=True in nova, >>> we should just set it in the gate CI configs, and leave it set >>> to False in Nova, so operators get good performance out of the >>> box. >>> >>> Also it has been a while since we added the workaround, and IIRC, >>> we've got newer Ubuntu available on at least some of the gate >>> hosts now, so we have the ability to test to see if it still >>> hits newer Ubuntu. >> >> Here is my reconstruction of the snapshot issue from what I can remember >> of the conversation. >> >> Nova defaults to live snapshots. This uses the libvirt facility which >> dumps both memory and disk. And then we throw away the memory. For large >> memory guests (especially volume backed ones that might have a fast path >> for the disk) this leads to a lot of overhead for no gain. The >> workaround got them past it. > > I think you've got it backwards there. > > Nova defaults to *not* using live snapshots: > > cfg.BoolOpt( > 'disable_libvirt_livesnapshot', > default=True, > help=""" > Disable live snapshots when using the libvirt driver. > ...""") > > > When live snapshot is disabled like this, the snapshot code is unable > to guarantee a consistent disk state. So the libvirt nova driver will > stop the guest by doing a managed save (this saves all memory to > disk), then does the disk snapshot, then restores the managed saved > (which loads all memory from disk). > > This is terrible for multiple reasons > > 1. the guest workload stops running while snapshot is taken > 2. we churn disk I/O saving & loading VM memory > 3. you can't do it at all if host PCI devices are attached to > the VM > > Enabling live snapshots by default fixes all these problems, at the > risk of hitting the live snapshot bug we saw in the gate CI but never > anywhere else.
Ah, right. I'll propose inverting the default and we'll see if we can get past the testing in the gate - https://review.openstack.org/#/c/373430/ -Sean -- Sean Dague http://dague.net __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev