How exactly does your crush rule look right now? I assume it's
supposed to distribute data across two sites, and since one site is
missing, the PGs stay in degraded state until the site comes back up.
You would need to either change the crush rule or assign a different
one to that pool which would allow to recover one the remaining site.
Zitat von "ronny.lippold" <>:
hi stefan ... i did the next step and need your help.
my idea was to stretch the cluster without stretch mode. so we
decided to reserve a size of 4 on each side.
the setup is the same as stretched mode, also crush rule, location,
election_strategy and tie breaker.
only "ceph mon enable_stretch_mode e stretch_rule datacenter" wasn't made.
now in my test, i made a split brain and expect, that on the
remaining side, the cluster will rebuild the 4 replica.
but that did not happen.
actually, the cluster, is doing the same stuff, as stretch mode
enabled. writeable with 2 replica.
can you explain me why? i'm spinning around.
this is the status during split brain:
pve-test02-01:~# ceph -s
id: 376fcdef-bba0-4e58-b63e-c9754dc948fa
6/13 mons down, quorum
1 datacenter (8 osds) down
8 osds down
6 hosts (8 osds) down
Degraded data redundancy: 2116/4232 objects degraded
(50.000%), 95 pgs degraded, 113 pgs undersized
mon: 13 daemons, quorum
pve-test01-01,pve-test01-03,pve-test01-05,pve-test02-01,pve-test02-03,pve-test02-05,tie-breaker (age 54m), out of quorum: pve-test01-02, pve-test01-04, pve-test01-06, pve-test02-02, pve-test02-04,
mgr: pve-test02-05(active, since 53m), standbys: pve-test01-05,
pve-test01-01, pve-test01-03, pve-test02-01, pve-test02-03
mds: 1/1 daemons up, 1 standby
osd: 16 osds: 8 up (since 54m), 16 in (since 77m)
volumes: 1/1 healthy
pools: 5 pools, 113 pgs
objects: 1.06k objects, 3.9 GiB
usage: 9.7 GiB used, 580 GiB / 590 GiB avail
pgs: 2116/4232 objects degraded (50.000%)
95 active+undersized+degraded
18 active+undersized
client: 17 KiB/s wr, 0 op/s rd, 10 op/s wr
thanks a lot,
Am 2024-04-30 11:42, schrieb Stefan Kooman:
On 30-04-2024 11:22, ronny.lippold wrote:
hi stefan ... you are the hero of the month ;)
i don't know, why i did not found your bug report.
i have the exact same problem and resolved the HEALTH only with
"ceph osd force_healthy_stretch_mode --yes-i-really-mean-it"
will comment the report soon.
actually, we think about 4/2 size without stretch mode enable.
what was your solution?
This specific setup (on which I did the testing) is going to be
full flash (SSD). So the HDDs are going to be phased out. And only
the default non-device-class crush rule will be used. While that
will work for this (small) cluster, it is not a solution. This
issue should be fixed, as I figure there are quite a few cluster
that want to use device-classes and use stretch mode at the same
Gr. Stefan
ceph-users mailing list --
To unsubscribe send an email to
ceph-users mailing list --
To unsubscribe send an email to