Is this a problem with your PGs being placed unevenly, with your PGs being sized very differently, or both?
CRUSH is never going to balance perfectly, but the numbers you're quoting look a bit worse than usual at first glance. -Greg On Tue, Apr 7, 2015 at 8:16 PM J David <j.david.li...@gmail.com> wrote: > Getting placement groups to be placed evenly continues to be a major > challenge for us, bordering on impossible. > > When we first reported trouble with this, the ceph cluster had 12 > OSD's (each Intel DC S3700 400GB) spread across three nodes. Since > then, it has grown to 8 nodes with 38 OSD's. > > The average utilization is 80%. With weights all set to 1, utlization > varies from 53% to 96%. Immediately after "ceph osd > reweight-by-utilization 105" it varies from 61% to 90%. Essentially, > once utilization goes over 75%, managing the osd weights to keep all > of them under 90% becomes a full-time job. > > This is on 0.80.9 with optimal tunables (including chooseleaf_vary_r=1 > and straw_calc_version=1 setting. The pool has 2048 placement groups > and has size=2. > > What, if anything, can we do about this? The goals are twofold, and > in priority order: > > 1) Guarantee that the cluster can survive the loss of a node without > dying because one "unlucky" OSD overfills. > > 2) Utilize the available space as efficiently as possible. We are > targeting 85% utilization, but currently things to get ugly pretty > quickly over 75%. > > Thanks for any advice! > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com