On Tue, Feb 14, 2017 at 5:27 AM, Tyanko Aleksiev <tyanko.alex...@gmail.com>
wrote:

> Hi Cephers,
>
> At University of Zurich we are using Ceph as a storage back-end for our
> OpenStack installation. Since we recently reached 70% of occupancy
> (mostly caused by the cinder pool served by 16384PGs) we are in the
> phase of extending the cluster with additional storage nodes of the same
> type (except for a slight more powerful CPU).
>
> We decided to opt for a gradual OSD deployment: we created a temporary
> "root"
> bucket called "fresh-install" containing the newly installed nodes and
> then we
> moved OSDs from this bucket to the current production root via:
>
> ceph osd crush set osd.{id} {weight} host={hostname} root={production_root}
>
> Everything seemed nicely planned but when we started adding a few new
> OSDs to the cluster, and thus triggering a rebalancing, one of the OSDs,
> already at 84% disk use, passed the 85% threshold. This in turn
> triggered the "near full osd(s)" warning and more than 20PGs previously
> in "wait_backfill" state were marked as: "wait_backfill+backfill_toofull".
> Since the OSD kept growing until, reached 90% disk use, we decided to
> reduce
> its relative weight from 1 to 0.95.
> The last action recalculated the crushmap and remapped a few PGs but did
> not appear to move any data off the almost full OSD. Only when, by steps
> of 0.05, we reached 0.50 of relative weight data was moved and some
> "backfill_toofull" requests were released. However, he had do go down
> almost to 0.10% of relative weight in order to trigger some additional
> data movement and have the backfilling process finally finished.
>
> We are now adding new OSDs but the problem is constantly triggered since
> we have multiple OSDs > 83% that starts growing during the rebalance.
>
> My questions are:
>
> - Is there something wrong in our process of adding new OSDs (some
> additional
> details below)?
>
>
It could work but - also could be more disruptive than need be. We have a
similar situation/configuration and what we do is start OSDs with ` osd
crush initial weight = 0` as well as "crush_osd_location" set properly.
This will weight the OSDs at 0 weight and let us bring them in in a
controlled fashion. We bring them in to 1 (no disruption), then crush
weight in gradually.


> - We also noticed that the problem has the tendency to cluster around the 
> newly
> added OSDs, so could those two things be correlated?
>
> I'm not sure which problem you are referring to - this OSDs filling?
Possibly due to temporary files or some other mechanism I'm not familiar
with adding a little extra data on top.

> - Why reweighting does not trigger instant data moving? What's the logic
> behind remapped PGs? Is there some sort of flat queue of tasks or does
> it have some priorities defined?
>
>
It should, perhaps you aren't choosing large enough increments or perhaps
you have some settings set.


> - Did somebody experience this situation and eventually how was it 
> solved/bypassed?
>
>
FWIW, we also run a rebalance cronjob every hour with the following:

`ceph osd reweight-by-utilization 103 .010 10`

it was detailed in another recent thread on [ceph-users]


> Cluster details are as follows:
>
> - version: 0.94.9
> - 5 monitors,
> - 40 storage hosts with an overall of 24 X 4TB disks: 1 OSD/disk (960 OSDs in 
> total),
> - osd pool default size = 3,
> - journaling is on SSDs.
>
> We have "hosts" failure domain. Relevant crushmap details:
>
> # rules
> rule sas {
>         ruleset 1
>         type replicated
>         min_size 1
>         max_size 10
>         step take sas
>         step chooseleaf firstn 0 type host
>         step emit
> }
>
> root sas {
>         id -41          # do not change unnecessarily
>         # weight 3283.279
>         alg straw
>         hash 0  # rjenkins1
>         item osd-l2-16 weight 87.360
>         item osd-l4-06 weight 87.360
>         ...
>         item osd-k7-41 weight 14.560
>         item osd-l4-36 weight 14.560
>         item osd-k5-36 weight 14.560
> }
>
> host osd-k7-21 {
>         id -46          # do not change unnecessarily
>         # weight 87.360
>         alg straw
>         hash 0  # rjenkins1
>         item osd.281 weight 3.640
>         item osd.282 weight 3.640
>         item osd.285 weight 3.640
>         ...
> }
>
> host osd-k7-41 {
>         id -50          # do not change unnecessarily
>         # weight 14.560
>         alg straw
>         hash 0  # rjenkins1
>         item osd.900 weight 3.640
>         item osd.901 weight 3.640
>         item osd.902 weight 3.640
>         item osd.903 weight 3.640
> }
>
>
> As mentioned before we created a temporary bucket called "fresh-install"
> containing the newly installed nodes (i.e.):
>
> root fresh-install {
>         id -34          # do not change unnecessarily
>         # weight 218.400
>         alg straw
>         hash 0  # rjenkins1
>         item osd-k5-36-fresh weight 72.800
>         item osd-k7-41-fresh weight 72.800
>         item osd-l4-36-fresh weight 72.800
> }
>
> Then, by steps of 6 OSDs (2 OSDs from each new host), we move OSDs from
> the "fresh-install" to the "sas" bucket.
>
>
I would highly recommend a simple script to weight in gradually as
described above. Much more controllable and you can twiddle the knobs to
your heart's desire.

>
> Thank you in advance for all the suggestions.
>
> Cheers,
> Tyanko
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
Hope that helps.

-- 
Brian Andrus | Cloud Systems Engineer | DreamHost
brian.and...@dreamhost.com | www.dreamhost.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to