[ceph-users] Re: The reason of recovery_unfound pg

Satoru Takeuchi Sun, 22 Aug 2021 18:16:29 -0700

Hi Dominic,

2021年8月21日(土) 7:17 <dhils...@performair.com>:
>
> Satoru;
>
> Ok.  What your cluster is telling you, then, is that it doesn't know which 
> replica is the "most current" or "correct" replica.  You will need to 
> determine that, and let ceph know which one to use as the "good" replica.  
> Unfortunately, I can't help you with this.  In fact, if this is critical 
> data, I'd seriously consider engaging a contractor to help you recover the 
> data, and help your cluster return to a fully operational status.
>
> I have found it helpful to set noout, and norebalance, when I intend to 
> reboot or offline any of my OSDs.
>
> It's also critical to allow the cluster to return to a cluster state of 
> HEALTH_OK in between reboots.
>
> Thank you,


Thank you very much for your answer and advice!

Best,
Satoru

>
> Dominic L. Hilsbos, MBA
> Vice President – Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
> From: Satoru Takeuchi [mailto:satoru.takeu...@gmail.com]
> Sent: Friday, August 20, 2021 2:48 PM
> To: Dominic Hilsbos
> Cc: ceph-users
> Subject: Re: [ceph-users] Re: The reason of recovery_unfound pg
>
> Hi Dominic,
>
> 2021年8月21日(土) 1:05 <dhils...@performair.com>:
> Satoru;
>
> You said " after restarting all nodes one by one."  After each reboot, did 
> you allow the cluster the time necessary to come back to a "HEALTH_OK" status?
>
>
> No, the we rebooted with the following policy.
>
> 1. Reboot one machine.
> 2. Wait until completing reboot as a Kubernetes level (not Ceph cluster 
> level).
> 3. If there are other nodes to be rebooted, go to step 1.
>
> I should have explained this logic to you as well.
> I realized that above logic is wrong and I should wait coming back to 
> HEALTH_OK.
> Unfortunately I doesn't understand the meaning of pg state well and there seem
> to be several states which mean "pg might be lost".
>
> https://docs.ceph.com/en/latest/rados/operations/pg-states/
>
> Could you tell me that pg can become `recovery_unfoud` state in this case?
>
> Thanks,
> Satoru
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: The reason of recovery_unfound pg

Reply via email to