Hi all,

I am trying to find a simple way that might help me better distribute my data, 
as I wrap up my Nautilus upgrades.

Currently rebuilding some OSD's with bigger block.db to prevent BlueFS 
spillover where it isn't difficult to do so, and I'm once again struggling with 
unbalanced distribution, despite having used upmap balancer.

I recently discovered that my previous usage of the balancer module with 
crush-compat mode before the upmap mode has left some lingering compat weight 
sets, which I believe may account for my less than stellar distribution, as I 
now have 2-3 weightings fighting against each other (upmap balancer, compat 
weight set, reweight). Below is a snippet showing the compat differing.

> $ ceph osd crush tree
> ID  CLASS WEIGHT    (compat)  TYPE NAME
> -55        43.70700  42.70894         chassis node2425
>  -2        21.85399  20.90097             host node24
>   0   hdd   7.28499   7.75699                 osd.0
>   8   hdd   7.28499   6.85500                 osd.8
>  16   hdd   7.28499   6.28899                 osd.16
>  -3        21.85399  21.80797             host node25
>   1   hdd   7.28499   7.32899                 osd.1
>   9   hdd   7.28499   7.24399                 osd.9
>  17   hdd   7.28499   7.23499                 osd.17

So my main question is how do I [re]set the compat value, to match the weight, 
so that the upmap balancer can more precisely balance the data?

It looks like I may have two options, with 
> ceph osd crush weight-set reweight-compat {name} {weight}
or
> ceph osd crush weight-set rm-compat

I assume the first would be to manage a single device/host/chassis/etc and the 
latter would nuke all compat values across the board?

And in looking at this, I started poking at my tunables, and I have no clue how 
to interpret the values, nor what I believe what they should be.

> $ ceph osd crush show-tunables
> {
>     "choose_local_tries": 0,
>     "choose_local_fallback_tries": 0,
>     "choose_total_tries": 50,
>     "chooseleaf_descend_once": 1,
>     "chooseleaf_vary_r": 1,
>     "chooseleaf_stable": 0,
>     "straw_calc_version": 1,
>     "allowed_bucket_algs": 22,
>     "profile": "firefly",
>     "optimal_tunables": 0,
>     "legacy_tunables": 0,
>     "minimum_required_version": "hammer",
>     "require_feature_tunables": 1,
>     "require_feature_tunables2": 1,
>     "has_v2_rules": 0,
>     "require_feature_tunables3": 1,
>     "has_v3_rules": 0,
>     "has_v4_buckets": 1,
>     "require_feature_tunables5": 0,
>     "has_v5_rules": 0
> }

This is a Jewel -> Luminous -> Mimic -> Nautilus cluster, and pretty much all 
the clients support Jewel/Luminous+ feature sets (jewel clients are 
kernel-cephfs clients, even though recent (4.15-4.18) kernels).
> $ ceph features | grep release
>             "release": "luminous",
>             "release": "luminous",
>             "release": "luminous",
>             "release": "jewel",
>             "release": "jewel",
>             "release": "luminous",
>             "release": "luminous",
>             "release": "luminous",
>             "release": "luminous",

I feel like I should be running optimal tunables, but I believe I am running 
default?
Not sure how much of a difference exists there, or if that will trigger a bunch 
of data movement either.

Hopefully someone will be able to steer me in a positive direction here, and I 
can mostly trigger a single, large data movement and return to a happy, 
balanced cluster once again.

Thanks,

Reed

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to