Right, a bit more digging around and the issue seem to relate to the ceph storage. Here is the log from libvirt:
cat r-1407-VM.log 2015-10-21 11:04:59.262+0000: starting up LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm-spice -name r-1407-VM -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 256 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 815d2860-cc7f-475d-bf63-02814c720fe4 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/r-1407-VM.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=rbd:Primary-ubuntu-1/c3f90fb4-c1a6-4e99-a2c0-64ae4517412e:id=admin:key=AQDiDbJR2GqPABAAWCcsUQ+UQwK8z9c6LWrizw==:auth_supported=cephx\;none:mon_host=ceph-mon.csprdc.arhont.com\:6789,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -drive file=/usr/share/cloudstack-common/vms/systemvm.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw,cache=none -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=54,id=hostnet0,vhost=on,vhostfd=55 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=02:00:2e:f7:00:18,bus=pci.0,addr=0x3,rombar=0,romfile= -netdev tap,fd=56,id=hostnet1,vhost=on,vhostfd=57 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=0e:00:a9:fe:01:42,bus=pci.0,addr=0x4,rombar=0,romfile= -netdev tap,fd=58,id=hostnet2,vhost=on,vhostfd=59 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=06:0c:b6:00:02:13,bus=pci.0,addr=0x5,rombar=0,romfile= -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/r-1407-VM.agent,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=r-1407-VM.vport -device usb-tablet,id=input0 -vnc 192.168.169.2:10,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 Domain id=42 is tainted: high-privileges libust[20136/20136]: Warning: HOME environment variable not set. Disabling LTTng-UST per-user tracing. (in setup_local_apps() at lttng-ust-comm.c:305) char device redirected to /dev/pts/13 (label charserial0) librbd/LibrbdWriteback.cc: In function 'virtual ceph_tid_t librbd::LibrbdWriteback::write(const object_t&, const object_locator_t&, uint64_t, uint64_t, const SnapContext&, const bufferlist&, utime_t, uint64_t, __u32, Context*)' thread 7ffa6b7fe700 time 2015-10-21 12:05:07.901876 librbd/LibrbdWriteback.cc: 160: FAILED assert(m_ictx->owner_lock.is_locked()) ceph version 0.94.4 (95292699291242794510b39ffde3f4df67898d3a) 1: (()+0x17258b) [0x7ffa92ef758b] 2: (()+0xa9573) [0x7ffa92e2e573] 3: (()+0x3a90ca) [0x7ffa9312e0ca] 4: (()+0x3b583d) [0x7ffa9313a83d] 5: (()+0x7212c) [0x7ffa92df712c] 6: (()+0x9590f) [0x7ffa92e1a90f] 7: (()+0x969a3) [0x7ffa92e1b9a3] 8: (()+0x4782a) [0x7ffa92dcc82a] 9: (()+0x56599) [0x7ffa92ddb599] 10: (()+0x7284e) [0x7ffa92df784e] 11: (()+0x162b7e) [0x7ffa92ee7b7e] 12: (()+0x163c10) [0x7ffa92ee8c10] 13: (()+0x8182) [0x7ffa8ec49182] 14: (clone()+0x6d) [0x7ffa8e97647d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. terminate called after throwing an instance of 'ceph::FailedAssertion' 2015-10-21 11:05:08.091+0000: shutting down For some reason this only affects virtual routers and not CPVM/SSVM or the vm guests. Any thoughts? I will post it to the ceph mailing list as well. perhaps someone there will have a clue. Thanks ----- Original Message ----- From: "Andrei Mikhailovsky" <[email protected]> To: [email protected] Sent: Wednesday, 21 October, 2015 12:25:28 PM Subject: Re: KVM - No longer able to start virtual routers I have also forgot to mention that the Console Proxy and SSVM virtual machines can be successfully created. I've removed them and they were recreated by ACS without any issues. It's the virtual routers which are not playing well. This tells me that there is no issue with the systemvm template or the storage servers. Andrei ----- Original Message ----- From: "Andrei Mikhailovsky" <[email protected]> To: [email protected] Sent: Wednesday, 21 October, 2015 11:36:28 AM Subject: KVM - No longer able to start virtual routers Hello guys, I have recently upgraded from ACS 4.5.1 to 4.5.2. The upgrade went well as far as I can tell, no error messages. As there was no need to update the systemvm templates, i've not bothered to reboot the system vms and virtual routers. I am running Ubuntu 14.04 with KVM hosts and using nfs for secondary and ceph rbd for primary storage. Today i've tried to restart one of the networks with the clean up option and noticed that the restart has failed. After a bit of poking around i've identified that ACS is no longer able to create virtual routers. ACS just showing them with Starting status, but they are not starting on the host server. I've looked at the host server, which is suppose to be starting that virtual router and there is no sign of the domain being created. The host server has the following entries in the agent.log file: 2015-10-21 10:39:02,140 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-3:null) Executing: /usr/share/cloudstack-common/scripts/vm/hypervisor/kvm/patchviasocket.pl -n r-1405-VM -p %template=domP%name=r-1405-VM%eth2ip=XXX.XXX.XXX.53%eth2mask=255.255.255.128%gateway=XXX.XXX.XXX.1%eth0ip=10.1.1.1%eth0mask=255.255.255.0%domain=kaspersky.local%cidrsize=24%dhcprange=10.1.1.1%eth1ip=169.254.3.232%eth1mask=255.255.0.0%type=router%disable_rp_filter=true%dns1=178.248.108.130%dns2=91.224.1.152 2015-10-21 10:39:02,172 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-3:null) Exit value is 111 2015-10-21 10:39:02,173 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-3:null) ERROR: unable to connect to /var/lib/libvirt/qemu/r-1405-VM.agent - Connection refused 2015-10-21 10:39:02,174 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-3:null) passcmd failed:ERROR: unable to connect to /var/lib/libvirt/qemu/r-1405-VM.agent - Connection refused I've checked ssvm and it seems to be running perfectly well. The /usr/local/cloud/systemvm/ssvm-check.sh script produces no warnings or errors, the nfs mount point is mounted and writable. I've also tried to create a new vm with a new network and that vm is not starting because ACS is unable to start the virtual router for the network. Any idea how to get this issue resolved? Thanks
