Actually I tried all the ways which I found them on Ceph Docs and mailing lists but non of them had no effect. As a last resort I changed pg/pgp.
Anyway… What can I do as the best way to solve this problem? Thanks > On Jul 3, 2016, at 1:43 PM, Wido den Hollander <w...@42on.com> wrote: > > >> Op 3 juli 2016 om 11:02 schreef Roozbeh Shafiee <roozbeh.shaf...@gmail.com>: >> >> >> Yes, you’re right but I have 0 object/s recovery last night. when I changed >> pg/pgp from 1400 >> to 2048, rebalancing speeded up but the percentage of rebalancing backed to >> 53%. >> > > Why did you change that? I would not change that value while a cluster is > still in recovery. > >> I have this situation again n again since I dropped out failed OSD when I >> increase pg/pgp but >> each time rebalancing stopped at 0 objects/s and low speed transfer. >> > > Hard to judge at this point. You might want to try and restart osd.27 and see > if that gets things going again. It seems to be involved in many PGs which > are in 'backfilling' state. > > Wido > >> Thanks >> >>> On Jul 3, 2016, at 1:25 PM, Wido den Hollander <w...@42on.com> wrote: >>> >>> >>>> Op 3 juli 2016 om 10:50 schreef Roozbeh Shafiee >>>> <roozbeh.shaf...@gmail.com>: >>>> >>>> >>>> Thanks for quick response, Wido >>>> >>>> the "ceph -s" output has pasted here: >>>> http://pastie.org/10897747 >>>> >>>> and this is output of “ceph health detail”: >>>> http://pastebin.com/vMeURWC9 >>>> >>> >>> It seems the cluster is still backfilling PGs and you 'ceph -s' shows so: >>> 'recovery io 62375 kB/s, 15 objects/s' >>> >>> It will just take some time before it finishes. >>> >>> Wido >>> >>>> Thank you >>>> >>>>> On Jul 3, 2016, at 1:10 PM, Wido den Hollander <w...@42on.com> wrote: >>>>> >>>>> >>>>>> Op 3 juli 2016 om 10:34 schreef Roozbeh Shafiee >>>>>> <roozbeh.shaf...@gmail.com>: >>>>>> >>>>>> >>>>>> Hi list, >>>>>> >>>>>> A few days ago one of my OSDs failed and I dropped out that but >>>>>> afterwards I got >>>>>> HEALTH_WARN until now. After turing off the OSD, the self-healing system >>>>>> started >>>>>> to rebalance data between other OSDs. >>>>>> >>>>>> My question is: At the end of rebalancing, the process doesn’t complete >>>>>> and I get this message >>>>>> at the end of “ceph -s” output: >>>>>> >>>>>> recovery io 1456 KB/s, 0 object/s >>>>>> >>>>> >>>>> Could you post the exact output of 'ceph -s'? >>>>> >>>>> There is something more which needs to be shown. >>>>> >>>>> 'ceph health detail' also might tell you more. >>>>> >>>>> Wido >>>>> >>>>>> how can I back to HEALTH_OK situation again? >>>>>> >>>>>> My cluster details are: >>>>>> >>>>>> - 27 OSDs >>>>>> - 3 MONs >>>>>> - 2048 pg/pgs >>>>>> - Each OSD has 4 TB of space >>>>>> - CentOS 7.2 with 3.10 linux kernel >>>>>> - Ceph Hammer version >>>>>> >>>>>> Thank you, >>>>>> Roozbeh_______________________________________________ >>>>>> ceph-users mailing list >>>>>> ceph-users@lists.ceph.com >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >> _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com