[ceph-users] Re: Balancing PGs across OSDs

2019-11-19 Thread Thomas Schneider
Hello Paul, thanks for your analysis. I want to share more statistics of my cluster to follow-up on your response "You have way too few PGs in one of the roots". Here are the pool details: root@ld3955:~# ceph osd pool ls detail pool 11 'hdb_backup' replicated size 3 min_size 2 crush_rule 1 objec

[ceph-users] Re: Balancing PGs across OSDs

2019-11-19 Thread Konstantin Shalygin
On 11/19/19 4:01 PM, Thomas Schneider wrote: If Ceph is not cabable to manage rebalancing automatically, how can I proceed to rebalance the data manually? Use offline upmap for your target pool: ceph osd getmap -o om; osdmaptool om --upmap upmap.sh --upmap-pool=hdd_backup --upmap-deviation

[ceph-users] Re: add debian buster stable support for ceph-deploy

2019-11-19 Thread Jelle de Jong
On 11/18/19 8:08 PM, Paul Emmerich wrote: We maintain an unofficial mirror for Buster packages: https://croit.io/2019/07/07/2019-07-07-debian-mirror Thank you Paul. Yes I have seen the repository, however there is no ceph-deploy version in there, and ceph-deploy checks the version of debian a

[ceph-users] How proceed to change a crush rule and remap pg's?

2019-11-19 Thread Maarten van Ingen
Hi, I have a small but impacting error in my crush rules. For unknown reasons the rules are not using host but osd to place the data and thus we have some nodes with all three copies instead of three different nodes. We noticed this when rebooting a node and a pg became stale. My crush rule:

[ceph-users] Re: add debian buster stable support for ceph-deploy

2019-11-19 Thread Paul Emmerich
Correct, we don't package ceph-deploy, sorry. ceph-deploy is currently unmaintained, I wouldn't use it for a production setup at the moment. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Te

[ceph-users] Re: How proceed to change a crush rule and remap pg's?

2019-11-19 Thread Paul Emmerich
I don't think that there's a feasible way to do this in a controlled manner. I would just change it and trust Ceph's remapping mechanism to work properly. You could use crushtool to calculate what the new mapping is and then do something crazy with upmaps (move them manually to the new locations o

[ceph-users] Re: How proceed to change a crush rule and remap pg's?

2019-11-19 Thread Maarten van Ingen
Thanks, The crushtool didn't help me further much unless I did something crazy as you said. So I have started by just creating a new and correct rule and just change the the pools one by one to use the new rule. This seems to work fine and as far as I can see it didn't impact any user (much)

[ceph-users] jewel OSDs refuse to start up again

2019-11-19 Thread Janne Johansson
Three OSDs, holding the 3 replicas of a PG here are only half-starting, and hence that single PG gets stuck as "stale+active+clean". All died of suicide timeout while walking over a huge omap (pool 7 'default.rgw.buckets.index') and would not get the PG 7.b back online again. >From the logs, they

[ceph-users] Re: msgr2 not used on OSDs in some Nautilus clusters

2019-11-19 Thread Bryan Stillwell
Closing the loop here. I figured out that I missed a step during the Nautilus upgrade which was causing this issue: ceph osd require-osd-release nautilus If you don't do this your cluster will start having problems once you enable msgr2: ceph mon enable-msgr2 Based on how hard this was to tr

[ceph-users] Re: msgr2 not used on OSDs in some Nautilus clusters

2019-11-19 Thread Paul Emmerich
There should be a warning that says something like "all OSDs are running nautilus but require-osd-release nautilus is not set" That warning did exist for older releases, pretty sure nautilus also has it? Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit

[ceph-users] Re: msgr2 not used on OSDs in some Nautilus clusters

2019-11-19 Thread Bryan Stillwell
I know I've seen that warning before, but for some reason it wasn't alerting on these clusters which were upgraded to 14.2.2 first and then to 14.2.4. Bryan > On Nov 19, 2019, at 3:20 PM, Paul Emmerich wrote: > > Notice: This email is from an external sender. > > > > There should be a warni

[ceph-users] mgr hangs with upmap balancer

2019-11-19 Thread Bryan Stillwell
On multiple clusters we are seeing the mgr hang frequently when the balancer is enabled. It seems that the balancer is getting caught in some kind of infinite loop which chews up all the CPU for the mgr which causes problems with other modules like prometheus (we don't have the devicehealth mod