Hi, We have 5 OSD servers, with 10 OSDs each (journals on enterprise SSDs).
We lost an OSD and the cluster started to backfill the data to the rest of the OSDs - during which the latency skyrocketed on some OSDs and connected clients experienced massive IO wait. I’m trying to rectify the situation now and from what I can tell, these are the settings that might help. osd client op priority osd recovery op priority osd max backfills osd recovery max active 1. Does a high priority value mean it has higher priority? (if the other one has lower value) Or does a priority of 1 mean highest priority? 2. I’m running with default on these settings. Does anyone else have any experience changing those? Kind Regards, David Majchrzak _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com