[ceph-users] Weird pg degradation behavior

Szabo, Istvan (Agoda) Wed, 04 Dec 2024 00:34:14 -0800

Hi,

I lost a node due to cpu failure so until hardware fix, I let it down as Ceph 
marked osds out, however I keep it in the tree because in 1-2 days will be 
online again.


This is the current state:
    health: HEALTH_WARN                        Degraded data redundancy: 
1726/23917349718 objects degraded(0.000%), 1 pg degraded, 1 pg undersized

Out of the 9x osd nodes 2x of them have 4x nvme osd (temporarily) and 7x has 
only 2x (for index pool on 3 replicas) where this pg is located.
(The down osd node is a 2x nvme osd one)

This is some part of the query of that specific degraded pg:
...
 "up": [
     233,
     202
 ],
 "acting": [
     233,
     202
 ],
 "avail_no_missing": [
     "233",
     "202"
 ],
...

The above mentioned osds are on the 2x osd nodes with 4x nvme.

What is not clear, I don't see any probing osd or any indicator for the 3rd 
missing part of that specific pg.

This is the pgs_brief:
PG_STAT  STATE                        UP                         UP_PRIMARY  
ACTING                     ACTING_PRIMARY
10.f6     active+undersized+degraded                  [233,202]         233     
             [233,202]             233

Shouldn't have any indicator what it is looking for?
Which osd or some probing osd? (It's quincy 17.2.7).

If I purge the down osds on the down server,  I guess it would kick off 
recovery, I'm just curious why don't show anywhere what it is looking for?

Ty
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Weird pg degradation behavior

Reply via email to