Hi all, I am trying to find a simple way that might help me better distribute my data, as I wrap up my Nautilus upgrades.
Currently rebuilding some OSD's with bigger block.db to prevent BlueFS spillover where it isn't difficult to do so, and I'm once again struggling with unbalanced distribution, despite having used upmap balancer. I recently discovered that my previous usage of the balancer module with crush-compat mode before the upmap mode has left some lingering compat weight sets, which I believe may account for my less than stellar distribution, as I now have 2-3 weightings fighting against each other (upmap balancer, compat weight set, reweight). Below is a snippet showing the compat differing. > $ ceph osd crush tree > ID CLASS WEIGHT (compat) TYPE NAME > -55 43.70700 42.70894 chassis node2425 > -2 21.85399 20.90097 host node24 > 0 hdd 7.28499 7.75699 osd.0 > 8 hdd 7.28499 6.85500 osd.8 > 16 hdd 7.28499 6.28899 osd.16 > -3 21.85399 21.80797 host node25 > 1 hdd 7.28499 7.32899 osd.1 > 9 hdd 7.28499 7.24399 osd.9 > 17 hdd 7.28499 7.23499 osd.17 So my main question is how do I [re]set the compat value, to match the weight, so that the upmap balancer can more precisely balance the data? It looks like I may have two options, with > ceph osd crush weight-set reweight-compat {name} {weight} or > ceph osd crush weight-set rm-compat I assume the first would be to manage a single device/host/chassis/etc and the latter would nuke all compat values across the board? And in looking at this, I started poking at my tunables, and I have no clue how to interpret the values, nor what I believe what they should be. > $ ceph osd crush show-tunables > { > "choose_local_tries": 0, > "choose_local_fallback_tries": 0, > "choose_total_tries": 50, > "chooseleaf_descend_once": 1, > "chooseleaf_vary_r": 1, > "chooseleaf_stable": 0, > "straw_calc_version": 1, > "allowed_bucket_algs": 22, > "profile": "firefly", > "optimal_tunables": 0, > "legacy_tunables": 0, > "minimum_required_version": "hammer", > "require_feature_tunables": 1, > "require_feature_tunables2": 1, > "has_v2_rules": 0, > "require_feature_tunables3": 1, > "has_v3_rules": 0, > "has_v4_buckets": 1, > "require_feature_tunables5": 0, > "has_v5_rules": 0 > } This is a Jewel -> Luminous -> Mimic -> Nautilus cluster, and pretty much all the clients support Jewel/Luminous+ feature sets (jewel clients are kernel-cephfs clients, even though recent (4.15-4.18) kernels). > $ ceph features | grep release > "release": "luminous", > "release": "luminous", > "release": "luminous", > "release": "jewel", > "release": "jewel", > "release": "luminous", > "release": "luminous", > "release": "luminous", > "release": "luminous", I feel like I should be running optimal tunables, but I believe I am running default? Not sure how much of a difference exists there, or if that will trigger a bunch of data movement either. Hopefully someone will be able to steer me in a positive direction here, and I can mostly trigger a single, large data movement and return to a happy, balanced cluster once again. Thanks, Reed
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com