On Tue, Feb 14, 2017 at 5:27 AM, Tyanko Aleksiev <tyanko.alex...@gmail.com> wrote:
> Hi Cephers, > > At University of Zurich we are using Ceph as a storage back-end for our > OpenStack installation. Since we recently reached 70% of occupancy > (mostly caused by the cinder pool served by 16384PGs) we are in the > phase of extending the cluster with additional storage nodes of the same > type (except for a slight more powerful CPU). > > We decided to opt for a gradual OSD deployment: we created a temporary > "root" > bucket called "fresh-install" containing the newly installed nodes and > then we > moved OSDs from this bucket to the current production root via: > > ceph osd crush set osd.{id} {weight} host={hostname} root={production_root} > > Everything seemed nicely planned but when we started adding a few new > OSDs to the cluster, and thus triggering a rebalancing, one of the OSDs, > already at 84% disk use, passed the 85% threshold. This in turn > triggered the "near full osd(s)" warning and more than 20PGs previously > in "wait_backfill" state were marked as: "wait_backfill+backfill_toofull". > Since the OSD kept growing until, reached 90% disk use, we decided to > reduce > its relative weight from 1 to 0.95. > The last action recalculated the crushmap and remapped a few PGs but did > not appear to move any data off the almost full OSD. Only when, by steps > of 0.05, we reached 0.50 of relative weight data was moved and some > "backfill_toofull" requests were released. However, he had do go down > almost to 0.10% of relative weight in order to trigger some additional > data movement and have the backfilling process finally finished. > > We are now adding new OSDs but the problem is constantly triggered since > we have multiple OSDs > 83% that starts growing during the rebalance. > > My questions are: > > - Is there something wrong in our process of adding new OSDs (some > additional > details below)? > > It could work but - also could be more disruptive than need be. We have a similar situation/configuration and what we do is start OSDs with ` osd crush initial weight = 0` as well as "crush_osd_location" set properly. This will weight the OSDs at 0 weight and let us bring them in in a controlled fashion. We bring them in to 1 (no disruption), then crush weight in gradually. > - We also noticed that the problem has the tendency to cluster around the > newly > added OSDs, so could those two things be correlated? > > I'm not sure which problem you are referring to - this OSDs filling? Possibly due to temporary files or some other mechanism I'm not familiar with adding a little extra data on top. > - Why reweighting does not trigger instant data moving? What's the logic > behind remapped PGs? Is there some sort of flat queue of tasks or does > it have some priorities defined? > > It should, perhaps you aren't choosing large enough increments or perhaps you have some settings set. > - Did somebody experience this situation and eventually how was it > solved/bypassed? > > FWIW, we also run a rebalance cronjob every hour with the following: `ceph osd reweight-by-utilization 103 .010 10` it was detailed in another recent thread on [ceph-users] > Cluster details are as follows: > > - version: 0.94.9 > - 5 monitors, > - 40 storage hosts with an overall of 24 X 4TB disks: 1 OSD/disk (960 OSDs in > total), > - osd pool default size = 3, > - journaling is on SSDs. > > We have "hosts" failure domain. Relevant crushmap details: > > # rules > rule sas { > ruleset 1 > type replicated > min_size 1 > max_size 10 > step take sas > step chooseleaf firstn 0 type host > step emit > } > > root sas { > id -41 # do not change unnecessarily > # weight 3283.279 > alg straw > hash 0 # rjenkins1 > item osd-l2-16 weight 87.360 > item osd-l4-06 weight 87.360 > ... > item osd-k7-41 weight 14.560 > item osd-l4-36 weight 14.560 > item osd-k5-36 weight 14.560 > } > > host osd-k7-21 { > id -46 # do not change unnecessarily > # weight 87.360 > alg straw > hash 0 # rjenkins1 > item osd.281 weight 3.640 > item osd.282 weight 3.640 > item osd.285 weight 3.640 > ... > } > > host osd-k7-41 { > id -50 # do not change unnecessarily > # weight 14.560 > alg straw > hash 0 # rjenkins1 > item osd.900 weight 3.640 > item osd.901 weight 3.640 > item osd.902 weight 3.640 > item osd.903 weight 3.640 > } > > > As mentioned before we created a temporary bucket called "fresh-install" > containing the newly installed nodes (i.e.): > > root fresh-install { > id -34 # do not change unnecessarily > # weight 218.400 > alg straw > hash 0 # rjenkins1 > item osd-k5-36-fresh weight 72.800 > item osd-k7-41-fresh weight 72.800 > item osd-l4-36-fresh weight 72.800 > } > > Then, by steps of 6 OSDs (2 OSDs from each new host), we move OSDs from > the "fresh-install" to the "sas" bucket. > > I would highly recommend a simple script to weight in gradually as described above. Much more controllable and you can twiddle the knobs to your heart's desire. > > Thank you in advance for all the suggestions. > > Cheers, > Tyanko > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > Hope that helps. -- Brian Andrus | Cloud Systems Engineer | DreamHost brian.and...@dreamhost.com | www.dreamhost.com
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com