Thanks for the prompt reply Indeed I have different racks with different weights. Below the ceph osd tree" output
[root@ceph-mon-01 ~]# ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 272.80426 root default -7 109.12170 rack Rack11-PianoAlto -8 54.56085 host ceph-osd-04 30 hdd 5.45609 osd.30 up 1.00000 1.00000 31 hdd 5.45609 osd.31 up 1.00000 1.00000 32 hdd 5.45609 osd.32 up 1.00000 1.00000 33 hdd 5.45609 osd.33 up 1.00000 1.00000 34 hdd 5.45609 osd.34 up 1.00000 1.00000 35 hdd 5.45609 osd.35 up 1.00000 1.00000 36 hdd 5.45609 osd.36 up 1.00000 1.00000 37 hdd 5.45609 osd.37 up 1.00000 1.00000 38 hdd 5.45609 osd.38 up 1.00000 1.00000 39 hdd 5.45609 osd.39 up 1.00000 1.00000 -9 54.56085 host ceph-osd-05 40 hdd 5.45609 osd.40 up 1.00000 1.00000 41 hdd 5.45609 osd.41 up 1.00000 1.00000 42 hdd 5.45609 osd.42 up 1.00000 1.00000 43 hdd 5.45609 osd.43 up 1.00000 1.00000 44 hdd 5.45609 osd.44 up 1.00000 1.00000 45 hdd 5.45609 osd.45 up 1.00000 1.00000 46 hdd 5.45609 osd.46 up 1.00000 1.00000 47 hdd 5.45609 osd.47 up 1.00000 1.00000 48 hdd 5.45609 osd.48 up 1.00000 1.00000 49 hdd 5.45609 osd.49 up 1.00000 1.00000 -6 109.12170 rack Rack15-PianoAlto -3 54.56085 host ceph-osd-02 10 hdd 5.45609 osd.10 up 1.00000 1.00000 11 hdd 5.45609 osd.11 up 1.00000 1.00000 12 hdd 5.45609 osd.12 up 1.00000 1.00000 13 hdd 5.45609 osd.13 up 1.00000 1.00000 14 hdd 5.45609 osd.14 up 1.00000 1.00000 15 hdd 5.45609 osd.15 up 1.00000 1.00000 16 hdd 5.45609 osd.16 up 1.00000 1.00000 17 hdd 5.45609 osd.17 up 1.00000 1.00000 18 hdd 5.45609 osd.18 up 1.00000 1.00000 19 hdd 5.45609 osd.19 up 1.00000 1.00000 -4 54.56085 host ceph-osd-03 20 hdd 5.45609 osd.20 up 1.00000 1.00000 21 hdd 5.45609 osd.21 up 1.00000 1.00000 22 hdd 5.45609 osd.22 up 1.00000 1.00000 23 hdd 5.45609 osd.23 up 1.00000 1.00000 24 hdd 5.45609 osd.24 up 1.00000 1.00000 25 hdd 5.45609 osd.25 up 1.00000 1.00000 26 hdd 5.45609 osd.26 up 1.00000 1.00000 27 hdd 5.45609 osd.27 up 1.00000 1.00000 28 hdd 5.45609 osd.28 up 1.00000 1.00000 29 hdd 5.45609 osd.29 up 1.00000 1.00000 -5 54.56085 rack Rack17-PianoAlto -2 54.56085 host ceph-osd-01 0 hdd 5.45609 osd.0 up 1.00000 1.00000 1 hdd 5.45609 osd.1 up 1.00000 1.00000 2 hdd 5.45609 osd.2 up 1.00000 1.00000 3 hdd 5.45609 osd.3 up 1.00000 1.00000 4 hdd 5.45609 osd.4 up 1.00000 1.00000 5 hdd 5.45609 osd.5 up 1.00000 1.00000 6 hdd 5.45609 osd.6 up 1.00000 1.00000 7 hdd 5.45609 osd.7 up 1.00000 1.00000 8 hdd 5.45609 osd.8 up 1.00000 1.00000 9 hdd 5.45609 osd.9 up 1.00000 1.00000 [root@ceph-mon-01 ~]# On Mon, Jan 14, 2019 at 3:13 PM Dan van der Ster <d...@vanderster.com> wrote: > On Mon, Jan 14, 2019 at 3:06 PM Massimo Sgaravatto > <massimo.sgarava...@gmail.com> wrote: > > > > I have a ceph luminous cluster running on CentOS7 nodes. > > This cluster has 50 OSDs, all with the same size and all with the same > weight. > > > > Since I noticed that there was a quite "unfair" usage of OSD nodes (some > used at 30 %, some used at 70 %) I tried to activate the balancer. > > > > But the balancer doesn't start I guess because of this problem: > > > > [root@ceph-mon-01 ~]# ceph osd crush weight-set create-compat > > Error EPERM: crush map contains one or more bucket(s) that are not straw2 > > > > > > So I issued the command to convert from straw to straw2 (all the clients > are running luminous): > > > > > > [root@ceph-mon-01 ~]# ceph osd crush set-all-straw-buckets-to-straw2 > > Error EINVAL: new crush map requires client version hammer but > require_min_compat_client is firefly > > [root@ceph-mon-01 ~]# ceph osd set-require-min-compat-client jewel > > set require_min_compat_client to jewel > > [root@ceph-mon-01 ~]# ceph osd crush set-all-straw-buckets-to-straw2 > > [root@ceph-mon-01 ~]# > > > > > > After having issued the command, the cluster went in WARNING state > because ~ 12 % objects were misplaced. > > > > Is this normal ? > > I read somewhere that the migration from straw to straw2 should trigger > a data migration only if the OSDs have different sizes, which is not my > case. > > The relevant sizes to compare are the crush buckets across which you > are replicating. > Are you replicating host-wise or rack-wise? > Do you have hosts/racks with a different crush weight (e.g. different > crush size). > Maybe share your `ceph osd tree`. > > Cheers, dan > > > > > > > > > The cluster is still recovering, but what is worrying me is that it > looks like that data are being moved to the most used OSDs and the > MAX_AVAIL value is decreasing quite quickly. > > > > I hope that the recovery can finish without causing problems: then I > will immediately activate the balancer. > > > > But, if some OSDs are getting too full, is it safe to decrease their > weights while the cluster is still being recovered ? > > > > Thanks a lot for your help > > Of course I can provide other info, if needed > > > > > > Cheers, Massimo > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com