[ceph-users] Re: OSD failed: still recovering

Anthony D'Atri Tue, 25 Mar 2025 21:25:31 -0700

> 
> OK, good to know about the 5% misplaced objects report 😊
> 
> I just checked 'ceph -s' and the misplaced objects is showing 1.948%, but I 
> suspect I will see this up to 5% or so later on 😊


If you see a place in the docs where it would help to note the balancer 
phenomenon and mistakenly thinking that progress wasn’t, well, progressing, 
please let me know and I’ll make it so.

> It does finally look like there is progress being made, as my "active+clean" 
> is currently 292 (out of a total 321 PGs), whereas it was seeming to progress 
> beyond 287 or so.  This is the result:
> 
> --- START ---
> root@cephnode01:/# ceph -s
>  cluster:
>    id:     474264fe-b00e-11ee-b586-ac1f6b0ff21a
>    health: HEALTH_ERR
>            1 failed cephadm daemon(s)
>            622 scrub errors
>            Possible data damage: 8 pgs inconsistent

Inconsistent isn’t good.  Look for medium errors on the OSDs in those PGs, 
grown errors, etc.  Try repairs, but if there are bad drives, the problem 
likely will come back.

> 
>    pgs:     566646/29086680 objects misplaced (1.948%)
>             292 active+clean
>             13  active+remapped+backfilling
>             7   active+clean+inconsistent
>             5   active+remapped+backfill_wait
>             2   active+clean+scrubbing
>             1   active+clean+scrubbing+deep
>             1   active+remapped+inconsistent+backfilling
> 
>  io:
>    client:   30 MiB/s rd, 2.5 MiB/s wr, 187 op/s rd, 240 op/s wr
>    recovery: 189 MiB/s, 48 objects/s
> --- END ---
> 
> I think the main numbers I need to keep an eye on are the ones that are 
> "backfilling"?  The "scrub" ones are just normal scrubs that are going on?

Yes, scrubs are normal.  Shallow scrubs are lightweight so they typically are 
done daily, the default interval for seep scrubs is weekly.  With HDDs, 
especially if EC, it is not unusual to extend the deep scrub interval so that 
the cluster can get through them without DoSing clients.

> 
> I am waiting for a full "active+clean" before dealing with the Scrub errors 
> (which are a result of the failed OSD).

No reason to necessarily wait, though the above backfill shouldn’t take long.  

> 
> Also, my "TiB used" keeps going up as well; is that because of my lost HDD 
> (the HDDs are 16TB).

TiB used as reported by the dashboard or `ceph df`?  Client data organic growth?

> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: OSD failed: still recovering

Reply via email to