While the Trusty transition was mostly uneventful, it has exposed a particular issue in libvirt, which is generating ~ 25% failure rate now on most tempest jobs.
As can be seen here - https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L294-L297 ... the libvirt live_snapshot code is something that our test pipeline has never tested before, because it wasn't a new enough libvirt for us to take that path. Right now it's exploding, a lot - https://bugs.launchpad.net/nova/+bug/1334398 Snapshotting gets used in Tempest to create images for testing, so image setup tests are doing a decent number of snapshots. If I had to take a completely *wild guess*, it's that libvirt can't do 2 live_snapshots at the same time. It's probably something that most people haven't hit. The wild guess is based on other libvirt issues we've hit that other people haven't, and they are basically always a parallel ops triggered problem. My 'stop the bleeding' suggested fix is this - https://review.openstack.org/#/c/102643/ which just effectively disables this code path for now. Then we can get some libvirt experts engaged to help figure out the right long term fix. I think there are a couple: 1) see if newer libvirt fixes this (1.2.5 just came out), and if so mandate at some known working version. This would actually take a bunch of work to be able to test a non packaged libvirt in our pipeline. We'd need volunteers for that. 2) lock snapshot operations in nova-compute, so that we can only do 1 at a time. Hopefully it's just 2 snapshot operations that is the issue, not any other libvirt op during a snapshot, so serializing snapshot ops in n-compute could put the kid gloves on libvirt and make it not break here. This also needs some volunteers as we're going to be playing a game of progressive serialization until we get to a point where it looks like the failures go away. 3) Roll back to precise. I put this idea here for completeness, but I think it's a terrible choice. This is one isolated, previously untested (by us), code path. We can't stay on libvirt 0.9.6 forever, so actually need to fix this for real (be it in nova's use of libvirt, or libvirt itself). There might be other options as well, ideas welcomed. But for right now, we should stop the bleeding, so that nova/libvirt isn't blocking everyone else from merging code. -Sean -- Sean Dague http://dague.net
signature.asc
Description: OpenPGP digital signature
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev