On 10/01/2014 01:43 PM, Indra Pramana wrote:
> Hi Wido,
> 
> Can you elaborate more on what do you mean by the size of our cluster? Is
> it because the cluster size is too big, or too small?
> 

I think it's probably because the Ceph cluster is to small. That causes
to much stress on the other nodes during recovery.

That again leads to libvirt and Qemu not being able to talk to Ceph
which leads to slow I/O.

I've seen multiple occasions where Ceph clusters are recovering but
CloudStack is still working just fine.

Wido

> Thank you.
> 
> On Wed, Oct 1, 2014 at 4:28 PM, Wido den Hollander <w...@widodh.nl> wrote:
> 
>>
>>
>> On 10/01/2014 09:21 AM, Indra Pramana wrote:
>>> Dear all,
>>>
>>> Anyone using CloudStack with Ceph RBD as primary storage? I am using
>>> CloudStack 4.2.0 with KVM hypervisors and Ceph latest stable version of
>>> dumpling.
>>>
>>
>> I am :)
>>
>>> Based on what I see, when Ceph cluster is in degraded state (not
>>> active+clean), for example due to one node is down and in recovering
>>> process, it might affect CloudStack operations. For example:
>>>
>>> - Stopped VM cannot be started, because it says cannot find suitable
>>> storage pool.
>>>
>>> - Disconnected host cannot be reconnected easily, even after restarting
>>> agent and libvirt on agent side, and restarting management server on the
>>> server side. Need to keep on trying and suddenly it will be connected/up
>> by
>>> itself.
>>>
>>
>> It really depends on the size of the cluster. It could be that the Ceph
>> is cluster is so busy with recovery that it can't process the I/O coming
>> from CloudStack and thus stalls.
>>
>> This is not a Ceph or CloudStack problem, but probably the size of your
>> cluster.
>>
>> Wido
>>
>>> Once Ceph has recovered and back to active+clean state, then CloudStack
>>> operations will be back to normal. Host agents will be up, and VMs can be
>>> started.
>>>
>>> Anyone seeing similar behaviour?
>>>
>>> Looking forward to your reply, thank you.
>>>
>>> Cheers.
>>>
>>
> 

Reply via email to