On Wed, Jul 09, 2014 at 08:34:06AM -0400, Sean Dague wrote: > On 07/09/2014 03:58 AM, Daniel P. Berrange wrote: > > On Tue, Jul 08, 2014 at 02:50:40PM -0700, Joe Gordon wrote: > >>>> But for right now, we should stop the bleeding, so that nova/libvirt > >>>> isn't blocking everyone else from merging code. > >>> > >>> Agreed, we should merge the hack and treat the bug as release blocker > >>> to be resolve prior to Juno GA. > >>> > >> > >> > >> How can we prevent libvirt issues like this from landing in trunk in the > >> first place? If we don't figure out a way to prevent this from landing the > >> first place I fear we will keep repeating this same pattern of failure. > > Right, this is where math is against us. If a race shows up 1% of the > time, you need 66 runs to have a 50% of seeing it. I still haven't > calibrated the bugs to an absolute scale, but I think based on what I > remember this livesnapshot bug was probably a 3-4% bug (per Tempest > run). So you'd need 50 Tempest runs to have an 80% to see it show up again. > > (Absolute calibration of the bugs is on my todo list for Elastic > Recheck, maybe it's time to put that in front of fixing the bugs) > > > Realistically I don't think there was much/any chance of avoiding this > > problem. Despite many days of work trying to reproduce it by multiple > > people, no one has managed even 1 single failure outside of the gate. > > Even inside the gate it is hard to reproduce. I still have absolutely > > no clue what is failing after days of investigation & debugging with > > all the tricks I can think of, because as I say, it works perfectly > > every time I try it, except in the gate where it is impossible to > > debug it. > > Out of curiosity, is your reproduce using eventlet? My expectation is > that eventlet's concurency actually exacerbates this because when the > snapshot starts we're now doing IO, and that means it's exactly the time > that other compute work will be triggered.
I've tried both running the tempest suite itself, and also running a dedicated stress test written against libvirt snapshot APIs in C. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev