This is just a followup for those who will encounter similar problem.
Originally this was a pool with only 4 nodes, size 3, min_size 2, big
node/osd weight difference(node weights 10, 2, 4, 4, osd weights from 2.5
to 0.5. detailed CRUSH map below(only 3 nodes left, issue persisted at this
point)[1
As we fixed failed node next day, cluster rebalanced to it's original state
without any issues, so crush dump would be irrelevant at this point I
guess. Will have to wait for next occurence.
Here's a tunables part, maybe it will help to shed some light:
"tunables": {
"choose_local_trie
seems like the crush cannot get enough osds for this pg,
what the output of 'ceph osd crush dump' and especially the 'tunables'
section values?
Vladimir Prokofev 于2019年3月27日周三 上午4:02写道:
>
> CEPH 12.2.11, pool size 3, min_size 2.
>
> One node went down today(private network interface started flapp
CEPH 12.2.11, pool size 3, min_size 2.
One node went down today(private network interface started flapping, and
after a while OSD processes crashed), no big deal, cluster recovered, but
not completely. 1 PG stuck in active+clean+remapped state.
PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACE