2016-07-14 15:26 GMT+02:00 Luis Periquito <periqu...@gmail.com>: > Hi Jaroslaw, > > several things are springing up to mind. I'm assuming the cluster is > healthy (other than the slow requests), right? > > Yes.
> From the (little) information you send it seems the pools are > replicated with size 3, is that correct? > > True. > Are there any long running delete processes? They usually have a > negative impact on performance, specially as they don't really show up > in the IOPS statistics. > During normal troughput we have small amount of deletes. > I've also something like this happen when there's a slow disk/osd. You > can try to check with "ceph osd perf" and look for higher numbers. > Usually restarting that OSD brings back the cluster to life, if that's > the issue. > I will check this. > If nothing shows, try a "ceph tell osd.* version"; if there's a > misbehaving OSD they usually don't respond to the command (slow or > even timing out). > > Also you also don't say how many scrub/deep-scrub processes are > running. If not properly handled they are also a performance killer. > > Scrub/deep-scrub processes are disabled Last, but by far not least, have you ever thought of creating a SSD > pool (even small) and move all pools but .rgw.buckets there? The other > ones are small enough, but enjoy having their own "reserved" osds... > > > This is one idea we had some time ago, we will try that. One important thing: sysop@s41617:~/bin$ ceph osd pool get .rgw.buckets pg_num pg_num: 4470 sysop@s41617:~/bin$ ceph osd pool get .rgw.buckets.index pg_num pg_num: 2048 Could be this a main problem? Regards -- Jarek
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com