I think we may have pinned libvirt-bin as well, (1.3.1), but I can't guarantee that, sorry - I would suggest its worth trying pinning both initially.
Chris On Thu, 23 Nov 2017 at 17:42 Joe Topjian <j...@topjian.net> wrote: > Hi Chris, > > Thanks - we will definitely look into this. To confirm: did you also > downgrade libvirt as well or was it all qemu? > > Thanks, > Joe > > On Thu, Nov 23, 2017 at 9:16 AM, Chris Sarginson <csarg...@gmail.com> > wrote: > >> We hit the same issue a while back (I suspect), which we seemed to >> resolve by pinning QEMU and related packages at the following version (you >> might need to hunt down the debs manually): >> >> 1:2.5+dfsg-5ubuntu10.5 >> >> I'm certain there's a launchpad bug for Ubuntu qemu regarding this, but >> don't have it to hand. >> >> Hope this helps, >> Chris >> >> On Thu, 23 Nov 2017 at 15:33 Joe Topjian <j...@topjian.net> wrote: >> >>> Hi all, >>> >>> We're seeing some strange libvirt issues in an Ubuntu 16.04 environment. >>> It's running Mitaka, but I don't think this is a problem with OpenStack >>> itself. >>> >>> We're in the process of upgrading this environment from Ubuntu 14.04 >>> with the Mitaka cloud archive to 16.04. Instances are being live migrated >>> (NFS share) to a new 16.04 compute node (fresh install), so there's a >>> change between libvirt versions (1.2.2 to 1.3.1). The problem we're seeing >>> is only happening on the 16.04/1.3.1 nodes. >>> >>> We're getting occasional reports of instances not able to be >>> snapshotted. Upon investigation, the snapshot process quits early with a >>> libvirt/qemu lock timeout error. We then see that the instance's xml file >>> has disappeared from /etc/libvirt/qemu and must restart libvirt and >>> hard-reboot the instance to get things back to a normal state. Trying to >>> live-migrate the instance to another node causes the same thing to happen. >>> >>> However, at some random time, either the snapshot or the migration will >>> work without error. I haven't been able to reproduce this issue on my own >>> and haven't been able to figure out the root cause by inspecting instances >>> reported to me. >>> >>> One thing that has stood out is the length of time it takes for libvirt >>> to start. If I run "/etc/init.d/libvirt-bin start", it takes at least 5 >>> minutes before a simple "virsh list" will work. The command will hang >>> otherwise. If I increase libvirt's logging level, I can see that during >>> this period of time, libvirt is working on iptables and ebtables (looks >>> like it's shelling out commands). >>> >>> But if I run "libvirtd -l" straight on the command line, all of this >>> completes within 5 seconds (including all of the shelling out). >>> >>> My initial thought is that systemd is doing some type of throttling >>> between the system and user slice, but I've tried comparing slice >>> attributes and, probably due to my lack of understanding of systemd, can't >>> find anything to prove this. >>> >>> Is anyone else running into this problem? Does anyone know what might be >>> the cause? >>> >>> Thanks, >>> Joe >>> _______________________________________________ >>> OpenStack-operators mailing list >>> OpenStack-operators@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>> >> >
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators