exion created CLOUDSTACK-10355: ---------------------------------- Summary: After upgrade to 4.11, Ceph RBD primary storage fails connection and renders node unusable Key: CLOUDSTACK-10355 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-10355 Project: CloudStack Issue Type: Bug Security Level: Public (Anyone can view this level - this is the default.) Components: cloudstack-agent Affects Versions: 4.11.0.0 Reporter: exion
On a perfectly working 4.10 node with KVM hypervisor and Ceph RBD primary storage, after upgrading to 4.11, cloudstack agent is unable to connect the BRD pool in libvirt, giving just a generic "operation not supported" error in its logs: 2018-04-06 16:27:37,650 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt 2018-04-06 16:27:37,652 WARN [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Storage pool be80af6a-7201-3410-8da4-9b3b58c4954f was not found running in libvirt. Need to create it. 2018-04-06 16:27:37,653 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Didn't find an existing storage pool be80af6a-7201-3410-8da4-9b3b58c4954f by UUID, checking for pools with duplicate paths 2018-04-06 16:27:37,664 ERROR [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-2:null) (logid:91b4e1df) Failed to create RBD storage pool: org.libvirt.LibvirtException: failed to connect to the RADOS monitor on: storagepool1:6789,: Operation not supported 2018-04-06 16:27:42,762 INFO [cloud.agent.Agent] (Agent-Handler-4:null) (logid:) Lost connection to the server. Dealing with the remaining commands... Exactly the same pool was previously working before upgrade: 2018-04-06 12:53:52,847 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Attempting to create storage pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt 2018-04-06 12:53:52,850 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Found existing defined storage pool be80af6a-7201-3410-8da4-9b3b58c4954f, using it. 2018-04-06 12:53:52,850 INFO [kvm.storage.LibvirtStorageAdaptor] (agentRequest-Handler-3:null) (logid:14dace5e) Trying to fetch storage pool be80af6a-7201-3410-8da4-9b3b58c4954f from libvirt 2018-04-06 12:53:53,171 INFO [cloud.agent.Agent] (agentRequest-Handler-2:null) (logid:14dace5e) Proccess agent ready command, agent id = 46 To nail out the issue I have tried to use the following XML config and attach the pool directly to libvirt in order to nail out system related issues, and it worked as expected: <pool type="rbd"> <name>be80af6a-7201-3410-8da4-9b3b58c4954f</name> <source> <name>cephstor1</name> <host name='storagepool1' port='6789'/> <auth username='admin' type='ceph'> <secret uuid='XXXXX'/> </auth> </source> </pool> virsh pool-create test.xml Pool be80af6a-7201-3410-8da4-9b3b58c4954f created from test.xml root@compute6:~# virsh pool-info be80af6a-7201-3410-8da4-9b3b58c4954f Name: be80af6a-7201-3410-8da4-9b3b58c4954f UUID: 47afe7d4-61cb-46c5-a642-93712c758b5c State: running Persistent: no Autostart: no Capacity: 10.05 TiB Allocation: 2.22 TiB Available: 2.71 TiB That being said the issue looks related to the way cloudstack scripts interface with libvirt's daemon. -- This message was sent by Atlassian JIRA (v7.6.3#76005)