Hi,

there are many threads dicussing recovery throughput, have you tried any of the solutions? First thing to try is to increase osd_recovery_max_active and osd_max_backfills. What are the current values in your cluster?


Zitat von Monish Selvaraj <mon...@xaasability.com>:

Hi,

Our ceph cluster consists of 20 hosts and 240 osds.

We used the erasure-coded pool with cache-pool concept.

Some time back 2 hosts went down and the pg are in a degraded state. We got
the 2 hosts back up in some time. After the pg is started recovering but it
takes a long time ( months )  . While this was happening we had the cluster
with 664.4 M objects and 987 TB data. The recovery status is not changed;
it remains 88 pgs degraded.

During this period, we increase the pg size from 256 to 512 for the
data-pool ( erasure-coded pool ).

We also observed (one week ) the recovery to be very slow, the current
recovery around 750 Mibs.

Is there any way to increase this recovery throughput ?

*Ceph-version : quincy*

[image: image.png]



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to