Can you please provide the output of `ceph status`, `ceph osd tree`, and
`ceph health detail`?  Thank you.

On Tue, Sep 19, 2017 at 2:59 PM Jonas Jaszkowic <
jonasjaszkowic.w...@gmail.com> wrote:

> Hi all,
>
> I have setup a Ceph cluster consisting of one monitor, 32 OSD hosts (1 OSD
> of size 320GB per host) and 16 clients which are reading
> and writing to the cluster. I have one erasure coded pool (shec plugin)
> with k=8, m=4, c=3 and pg_num=256. Failure domain is host.
> I am able to reach a HEALTH_OK state and everything is working as
> expected. The pool was populated with
> 114048 files of different sizes ranging from 1kB to 4GB. Total amount of
> data in the pool was around 3TB. The capacity of the
> pool was around 10TB.
>
> I want to evaluate how Ceph is rebalancing data in case of an OSD loss
> while clients are still reading. To do so, I am killing one OSD on purpose
> via *ceph osd out <osd-id> *without adding a new one, i.e. I have 31 OSDs
> left. Ceph seems to notice this failure and starts to rebalance data
> which I can observe with the *ceph -w *command.
>
> However, Ceph failed to rebalance the data. The recovering process seemed
> to be stuck at a random point. I waited more than 12h but the
> number of degraded objects did not reduce and some PGs were stuck. Why is
> this happening? Based on the number of OSDs and the k,m,c values
> there should be enough hosts and OSDs to be able to recover from a single
> OSD failure?
>
> Thank you in advance!
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to