exion created CLOUDSTACK-10355:
----------------------------------
Summary: After upgrade to 4.11, Ceph RBD primary storage fails
connection and renders node unusable
Key: CLOUDSTACK-10355
URL: https://issues.apache.org/jira/browse/CLOUDSTACK-10355
Project: CloudStack
Issue Type: Bug
Security Level: Public (Anyone can view this level - this is the default.)
Components: cloudstack-agent
Affects Versions: 4.11.0.0
Reporter: exion
On a perfectly working 4.10 node with KVM hypervisor and Ceph RBD primary
storage, after upgrading to 4.11, cloudstack agent is unable to connect the BRD
pool in libvirt, giving just a generic "operation not supported" error in its
logs:
2018-04-06 16:27:37,650 INFO [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-2:null) (logid:91b4e1df) Attempting to create storage
pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt
2018-04-06 16:27:37,652 WARN [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-2:null) (logid:91b4e1df) Storage pool
be80af6a-7201-3410-8da4-9b3b58c4954f was not found running in libvirt. Need to
create it.
2018-04-06 16:27:37,653 INFO [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-2:null) (logid:91b4e1df) Didn't find an existing storage
pool be80af6a-7201-3410-8da4-9b3b58c4954f by UUID, checking for pools with
duplicate paths
2018-04-06 16:27:37,664 ERROR [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-2:null) (logid:91b4e1df) Failed to create RBD storage
pool: org.libvirt.LibvirtException: failed to connect to the RADOS monitor on:
storagepool1:6789,: Operation not supported
2018-04-06 16:27:42,762 INFO [cloud.agent.Agent] (Agent-Handler-4:null)
(logid:) Lost connection to the server. Dealing with the remaining commands...
Exactly the same pool was previously working before upgrade:
2018-04-06 12:53:52,847 INFO [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-3:null) (logid:14dace5e) Attempting to create storage
pool be80af6a-7201-3410-8da4-9b3b58c4954f (RBD) in libvirt
2018-04-06 12:53:52,850 INFO [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-3:null) (logid:14dace5e) Found existing defined storage
pool be80af6a-7201-3410-8da4-9b3b58c4954f, using it.
2018-04-06 12:53:52,850 INFO [kvm.storage.LibvirtStorageAdaptor]
(agentRequest-Handler-3:null) (logid:14dace5e) Trying to fetch storage pool
be80af6a-7201-3410-8da4-9b3b58c4954f from libvirt
2018-04-06 12:53:53,171 INFO [cloud.agent.Agent] (agentRequest-Handler-2:null)
(logid:14dace5e) Proccess agent ready command, agent id = 46
To nail out the issue I have tried to use the following XML config and attach
the pool directly to libvirt in order to nail out system related issues, and it
worked as expected:
<pool type="rbd">
<name>be80af6a-7201-3410-8da4-9b3b58c4954f</name>
<source>
<name>cephstor1</name>
<host name='storagepool1' port='6789'/>
<auth username='admin' type='ceph'>
<secret uuid='XXXXX'/>
</auth>
</source>
</pool>
virsh pool-create test.xml
Pool be80af6a-7201-3410-8da4-9b3b58c4954f created from test.xml
root@compute6:~# virsh pool-info be80af6a-7201-3410-8da4-9b3b58c4954f
Name: be80af6a-7201-3410-8da4-9b3b58c4954f
UUID: 47afe7d4-61cb-46c5-a642-93712c758b5c
State: running
Persistent: no
Autostart: no
Capacity: 10.05 TiB
Allocation: 2.22 TiB
Available: 2.71 TiB
That being said the issue looks related to the way cloudstack scripts interface
with libvirt's daemon.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)