This sounds like you have widespread inconsistencies that are surfaced by 
scrubs, not caused by them.  Frequent causes:

* Using a RAID HBA with bugs (all of them, in my experience), with broken 
preserved cache replay, with forced writeback cache without BBU, etc.  
* Power was out longer than BBUs could last
* Volatile cache enabled on HDDs
* Client grade SSDs without PLP

> On Mar 11, 2025, at 6:57 AM, Martin Konold <martin.kon...@konsec.com> wrote:
> 
> 
> Hi,
> 
> I suspect a hw issue. Please check the networks.
> 
> Regards 
> --martin
> 
> Am 11.03.2025 11:24 schrieb Marianne Spiller <maria...@spiller.me>:
> Dear list,

> 
> I'm currently maintaining several Ceph (prod) installations. One of them 
> consists of 3 MON hosts and 6 OSD hosts hosting 40 OSDs in total. And there 
> are 5 separate Proxmox-Hosts - they only host the VMs and use the storage 
> provided by Ceph, but they are not part of Ceph.

> 
> The worst case happened: due to an outage, all these hosts crashed hardly the 
> same time.

> 
> Last week, I began to restart (only the Ceph hosts; Proxmox servers are still 
> down). Ceph was very unhappy with the situation as a whole - one OSD host 
> (and its 6 OSDs) is completely gone, some hardware issues (33 OSDs left, 
> networking, PSU, I'm working on it) and 73 out of 129 PGs inconsistent.

> 
> Meanwhile, the overall status of the cluster is "HEALTHY" again.

> But nearly every day, one or two PGs get damaged. Never on the same OSDs. And 
> there is no traffic on the storage as the virtualization hosts are not 
> running. I see no further reason in the logs: everything is fine, scrub 
> starts and leaves one or more PGs damaged. Repairing them is successful, but 
> maybe next night, another PG is stuck.

> 
> Do you have hints to investigate this any further? I would love to understand 
> more before starting the Proxmox cluster again. Using Ceph 18.2.4 (Proxmox 
> packages).

> 
> Thanks a lot,

>   Marianne
> _______________________________________________

> ceph-users mailing list -- ceph-users@ceph.io

> To unsubscribe send an email to ceph-users-le...@ceph.io

> 
> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to