Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

Craig Lewis Tue, 11 Nov 2014 13:38:49 -0800

How many OSDs are nearfull?

I've seen Ceph want two toofull OSDs to swap PGs.  In that case, I
dynamically raised mon_osd_nearfull_ratio and osd_backfill_full_ratio a
bit, then put it back to normal once the scheduling deadlock finished.

Keep in mind that ceph osd reweight is temporary.  If you mark an osd OUT
then IN, the weight will be set to 1.0.  If you need something that's
persistent, you can use ceph osd crush reweight osd.NUM <crust_weight>.
Look at ceph osd tree to get the current weight.

I also recommend stepping towards your goal.  Changing either weight can
cause a lot of unrelated migrations, and the crush weight seems to cause
more than the osd weight.  I step osd weight by 0.125, and crush weight by
0.05.

On Tue, Nov 11, 2014 at 12:47 PM, Chad Seys <cws...@physics.wisc.edu> wrote:

> Find out which OSD it is:
>
> ceph health detail
>
> Squeeze blocks off the affected OSD:
>
> ceph osd reweight OSDNUM 0.8
>
> Repeat with any OSD which becomes toofull.
>
> Your cluster is only about 50% used, so I think this will be enough.
>
> Then when it finishes, allow data back on OSD:
>
> ceph osd reweight OSDNUM 1
>
> Hopefully ceph will someday be taught to move PGs in a better order!
> Chad.
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] pg's stuck for 4-5 days after reaching backfill_toofull

Reply via email to