Hi Josh, 

Thanks for your reply. 
But this I already tried that, with no luck. 
Primary OSD goes down and hangs forever, upon "mark_unfound_lost delete” 
command. 

I guess it is too damaged to salvage, unless one really starts deleting 
individual corrupt objects?

Anyway, as I said. files in the PG are identified and under backup, so I just 
want to healthy, no matter what ;-)

I actually discovered that removing the pgs shards, with objectstore-tool 
indeed works in getting the pg back active-clean (containing 0 objects though). 

One just need to run a final remove - start/stop OSD - repair - mark-complete 
on the primary OSD. 
A scrub tells me that the "active+clean” state  is for real.

I also found out the more automated "force-create-pg" command only works on pgs 
that a in down state. 

Best, 
Jesper  
 

--------------------------
Jesper Lykkegaard Karlsen
Scientific Computing
Centre for Structural Biology
Department of Molecular Biology and Genetics
Aarhus University
Universitetsbyen 81
8000 Aarhus C

E-mail: je...@mbg.au.dk
Tlf:    +45 50906203

> On 20 Sep 2022, at 15.40, Josh Baergen <jbaer...@digitalocean.com> wrote:
> 
> Hi Jesper,
> 
> Given that the PG is marked recovery_unfound, I think you need to
> follow 
> https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-pg/#unfound-objects.
> 
> Josh
> 
> On Tue, Sep 20, 2022 at 12:56 AM Jesper Lykkegaard Karlsen
> <je...@mbg.au.dk> wrote:
>> 
>> Dear all,
>> 
>> System: latest Octopus, 8+3 erasure Cephfs
>> 
>> I have a PG that has been driving me crazy.
>> It had gotten to a bad state after heavy backfilling, combined with OSD 
>> going down in turn.
>> 
>> State is:
>> 
>> active+recovery_unfound+undersized+degraded+remapped
>> 
>> I have tried repairing it with ceph-objectstore-tool, but no luck so far.
>> Given the time recovery takes this way and since data are under backup, I 
>> thought that I would do the "easy" approach instead and:
>> 
>>  *   scan pg_files with cephfs-data-scan
>>  *   delete data beloging to that pool
>>  *   recreate PG with "ceph osd force-create-pg"
>>  *   restore data
>> 
>> Although, this has shown not to be so easy after all.
>> 
>> ceph osd force-create-pg 20.13f --yes-i-really-mean-it
>> 
>> seems to be accepted well enough with "pg 20.13f now creating, ok", but then 
>> nothing happens.
>> Issuing the command again just gives a "pg 20.13f already creating" response.
>> 
>> If I restart the primary OSD, then the pending force-create-pg disappears.
>> 
>> I read that this could be due to crush map issue, but I have checked and 
>> that does not seem to be the case.
>> 
>> Would it, for instance, be possible to do the force-create-pg manually with 
>> something like this?:
>> 
>>  *   set nobackfill and norecovery
>>  *   delete the pgs shards one by one
>>  *   unset nobackfill and norecovery
>> 
>> 
>> Any idea on how to proceed from here is most welcome.
>> 
>> Thanks,
>> Jesper
>> 
>> 
>> --------------------------
>> Jesper Lykkegaard Karlsen
>> Scientific Computing
>> Centre for Structural Biology
>> Department of Molecular Biology and Genetics
>> Aarhus University
>> Universitetsbyen 81
>> 8000 Aarhus C
>> 
>> E-mail: je...@mbg.au.dk
>> Tlf:    +45 50906203
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to