Hello Paul,
thanks for your analysis.
I want to share more statistics of my cluster to follow-up on your
response "You have way too few PGs in one of the roots".
Here are the pool details:
root@ld3955:~# ceph osd pool ls detail
pool 11 'hdb_backup' replicated size 3 min_size 2 crush_rule 1
objec
On 11/19/19 4:01 PM, Thomas Schneider wrote:
If Ceph is not cabable to manage rebalancing automatically, how can I
proceed to rebalance the data manually?
Use offline upmap for your target pool:
ceph osd getmap -o om; osdmaptool om --upmap upmap.sh
--upmap-pool=hdd_backup --upmap-deviation
On 11/18/19 8:08 PM, Paul Emmerich wrote:
We maintain an unofficial mirror for Buster packages:
https://croit.io/2019/07/07/2019-07-07-debian-mirror
Thank you Paul. Yes I have seen the repository, however there is no
ceph-deploy version in there, and ceph-deploy checks the version of
debian a
Hi,
I have a small but impacting error in my crush rules.
For unknown reasons the rules are not using host but osd to place the data and
thus we have some nodes with all three copies instead of three different nodes.
We noticed this when rebooting a node and a pg became stale.
My crush rule:
Correct, we don't package ceph-deploy, sorry.
ceph-deploy is currently unmaintained, I wouldn't use it for a
production setup at the moment.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Te
I don't think that there's a feasible way to do this in a controlled
manner. I would just change it and trust Ceph's remapping mechanism to work
properly.
You could use crushtool to calculate what the new mapping is and then do
something crazy with upmaps (move them manually to the new locations o
Thanks,
The crushtool didn't help me further much unless I did something crazy as you
said.
So I have started by just creating a new and correct rule and just change the
the pools one by one to use the new rule.
This seems to work fine and as far as I can see it didn't impact any user
(much)
Three OSDs, holding the 3 replicas of a PG here are only half-starting, and
hence that single PG gets stuck as "stale+active+clean".
All died of suicide timeout while walking over a huge omap (pool 7
'default.rgw.buckets.index') and would not get the PG 7.b back online
again.
>From the logs, they
Closing the loop here. I figured out that I missed a step during the Nautilus
upgrade which was causing this issue:
ceph osd require-osd-release nautilus
If you don't do this your cluster will start having problems once you enable
msgr2:
ceph mon enable-msgr2
Based on how hard this was to tr
There should be a warning that says something like "all OSDs are
running nautilus but require-osd-release nautilus is not set"
That warning did exist for older releases, pretty sure nautilus also has it?
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit
I know I've seen that warning before, but for some reason it wasn't alerting on
these clusters which were upgraded to 14.2.2 first and then to 14.2.4.
Bryan
> On Nov 19, 2019, at 3:20 PM, Paul Emmerich wrote:
>
> Notice: This email is from an external sender.
>
>
>
> There should be a warni
On multiple clusters we are seeing the mgr hang frequently when the balancer is
enabled. It seems that the balancer is getting caught in some kind of infinite
loop which chews up all the CPU for the mgr which causes problems with other
modules like prometheus (we don't have the devicehealth mod
12 matches
Mail list logo