Hi Wido, Is there a threshold of the "size" you just mention ?
Thanks a lot. 2014-10-01 19:48 GMT+08:00 Wido den Hollander <w...@widodh.nl>: > > > On 10/01/2014 01:43 PM, Indra Pramana wrote: > > Hi Wido, > > > > Can you elaborate more on what do you mean by the size of our cluster? Is > > it because the cluster size is too big, or too small? > > > > I think it's probably because the Ceph cluster is to small. That causes > to much stress on the other nodes during recovery. > > That again leads to libvirt and Qemu not being able to talk to Ceph > which leads to slow I/O. > > I've seen multiple occasions where Ceph clusters are recovering but > CloudStack is still working just fine. > > Wido > > > Thank you. > > > > On Wed, Oct 1, 2014 at 4:28 PM, Wido den Hollander <w...@widodh.nl> > wrote: > > > >> > >> > >> On 10/01/2014 09:21 AM, Indra Pramana wrote: > >>> Dear all, > >>> > >>> Anyone using CloudStack with Ceph RBD as primary storage? I am using > >>> CloudStack 4.2.0 with KVM hypervisors and Ceph latest stable version of > >>> dumpling. > >>> > >> > >> I am :) > >> > >>> Based on what I see, when Ceph cluster is in degraded state (not > >>> active+clean), for example due to one node is down and in recovering > >>> process, it might affect CloudStack operations. For example: > >>> > >>> - Stopped VM cannot be started, because it says cannot find suitable > >>> storage pool. > >>> > >>> - Disconnected host cannot be reconnected easily, even after restarting > >>> agent and libvirt on agent side, and restarting management server on > the > >>> server side. Need to keep on trying and suddenly it will be > connected/up > >> by > >>> itself. > >>> > >> > >> It really depends on the size of the cluster. It could be that the Ceph > >> is cluster is so busy with recovery that it can't process the I/O coming > >> from CloudStack and thus stalls. > >> > >> This is not a Ceph or CloudStack problem, but probably the size of your > >> cluster. > >> > >> Wido > >> > >>> Once Ceph has recovered and back to active+clean state, then CloudStack > >>> operations will be back to normal. Host agents will be up, and VMs can > be > >>> started. > >>> > >>> Anyone seeing similar behaviour? > >>> > >>> Looking forward to your reply, thank you. > >>> > >>> Cheers. > >>> > >> > > >