Hi all,

we have problem on our production cluster running nautilus (14.2.22).

Cluster is almost full and few month ago we noticed issues with slow peering - when we restart any osd (or host) it takes hours to finish peering process, instead of minutes.

We noticed, that some pool contains 90k in 300GB objects per PG, so we decided to increase pg_num on that pool so individual PG is peered quickly. During that state we got into stuck PG inactive for hours and peering not finised, and some OSD went down with this error https://tracker.ceph.com/issues/51168

We decided to restart all osds and waiting, but problem with slow peering 
persists.

Is there any way how to get cluster healthy? Or disable peering of some pool so other pools with RBD images get peered and get online and after that try to peer that big pool?

Thank you for help, it is urgent situation

With regards
Jan Pekar

--

============
Ing. Jan Pekař
jan.pe...@imatic.cz
----
Imatic | Jagellonská 14 | Praha 3 | 130 00
https://www.imatic.cz | +420326555326
============
--

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to