Re: [ceph-users] unfound objects blocking cluster, need help!

2016-10-07 Thread Paweł Sadowski
Hi, I work with Tomasz and I'm investigating this situation. We still don't fully understood why there was unfound object after removing single OSD. >From logs[1] it looks like all PGs were active+clean before marking that OSD out. After that backfills started on multiple OSDs. Three minutes later

Re: [ceph-users] unfound objects blocking cluster, need help!

2016-10-02 Thread Dan van der Ster
Hi, Do you understand why removing that osd led to unfound objects? Do you have the ceph.log from yesterday? Cheers, Dan On 2 Oct 2016 10:18, "Tomasz Kuzemko" wrote: > > Forgot to mention Ceph version - 0.94.5. > > I managed to fix this. By chance I found that when an OSD for a blocked PG is st

Re: [ceph-users] unfound objects blocking cluster, need help!

2016-10-02 Thread Tomasz Kuzemko
Forgot to mention Ceph version - 0.94.5. I managed to fix this. By chance I found that when an OSD for a blocked PG is starting, there is a few-second time window (after load_pgs) in which it accepts commands related to the blocked PG. So first I managed to capture "ceph pg PGID query" this way. T

[ceph-users] unfound objects blocking cluster, need help!

2016-10-01 Thread Tomasz Kuzemko
Hi, I have a production cluster on which 1 OSD on a failing disk was slowing the whole cluster down. I removed the OSD (osd.87) like usual in such case but this time it resulted in 17 unfound objects. I no longer have the files from osd.87. I was able to call "ceph pg PGID mark_unfound_lost delete