On Wed, May 11, 2016 at 6:53 PM,  <george.vasilaka...@stfc.ac.uk> wrote:
> Hey Dan,
>
> This is on Hammer 0.94.5. osd.52 was always on a problematic machine and when 
> this happened had less data on its local disk than the other OSDs. I've tried 
> adapting that blog post's solution to this situation to no avail.


Do you have a log of what you did and why it didn't work? I guess the
solution to your issue lies in a version of that procedure.

-- dan




>
> I've tried things like looking at all probing OSDs in the query output and 
> importing the data from one copy to all of them to get it to be consistent. 
> One of the major red flags here was that when I looked at the original acting 
> set's disks I found each OSD had a different amount of data for the same PG, 
> there is at least one PG here where 52 (the primary for all four) actually 
> had about 1GB (~27%) less data, everything has just been really inconsistent.
>
> Here's hoping Cunningham will come to the rescue.
>
> Cheers,
>
> George
>
> ________________________________________
> From: Dan van der Ster [d...@vanderster.com]
> Sent: 11 May 2016 17:28
> To: Vasilakakos, George (STFC,RAL,SC)
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Incomplete PGs, how do I get them back without data 
> loss?
>
> Hi George,
>
> Which version of Ceph is this?
> I've never had incompete pgs stuck like this before. AFAIK it means
> that osd.52 would need to be brought up before you can restore those
> PGs.
>
> Perhaps you'll need ceph-objectstore-tool to help dump osd.52 and
> bring up its data elsewhere. A quick check on this list pointed to
> https://ceph.com/community/incomplete-pgs-oh-my/ -- did you try that?
>
> Or perhaps I'm spewing enough nonsense here that Cunningham's Law will
> bring you the solution.
>
> Cheers, Dan
>
>
>
> On Thu, May 5, 2016 at 8:21 PM,  <george.vasilaka...@stfc.ac.uk> wrote:
>> Hi folks,
>>
>> I've got a serious issue with a Ceph cluster that's used for RBD.
>>
>> There are 4 PGs stuck in an incomplete state and I'm trying to repair this 
>> problem to no avail.
>>
>> Here's ceph status:
>> health HEALTH_WARN
>>             4 pgs incomplete
>>             4 pgs stuck inactive
>>             4 pgs stuck unclean
>>             100 requests are blocked > 32 sec
>>      monmap e13: 3 mons at ...
>>             election epoch 2084, quorum 0,1,2 mon4,mon5,mon3
>>      osdmap e154083: 203 osds: 197 up, 197 in
>>       pgmap v37369382: 9856 pgs, 5 pools, 20932 GB data, 22321 kobjects
>>             64871 GB used, 653 TB / 716 TB avail
>>                 9851 active+clean
>>                    4 incomplete
>>                    1 active+clean+scrubbing
>>
>> The 4 PGs all have the same primary OSD, which is on a host that had its 
>> OSDs turned off as it was quite flaky.
>>
>> 1.1bdb    incomplete    [52,100,130]    52    [52,100,130]    52
>> 1.5c2    incomplete    [52,191,109]    52    [52,191,109]    52
>> 1.f98    incomplete    [52,92,37]    52    [52,92,37]    52
>> 1.11dc    incomplete    [52,176,12]    52    [52,176,12]    52
>>
>> One thing that strikes me as odd is that once osd.52 is taken out, these 
>> sets change completely.
>> The situation currently is that, for each of these PGs, the three OSDs have 
>> different amounts of data.
>> They all have similar but different amounts, with osd.52 having the smallest 
>> amount (not by too much though) in each case.
>>
>> Querying those PGs doesn't return a response after a few minutes, manually 
>> triggering scrubs or repairs on them does nothing.
>> I've lowered the min_size from 2 to 1 but I'm not seeing any activity to fix 
>> this.
>>
>> Is there something that can be done to recover without losing that data (it 
>> means each VM has a 75% chance of being destroyed)?
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to