What exactly do you mean by log? As in a journal of the actions taken or 
logging done by a daemon.
I'm making the same guess but I'm not sure what else I can try at this point. 
The PG I've been working on actively reports it needs to probe 4 OSDs (the new 
set and the old primary) which are all up and have the same amount of data, 
last changed at the same time. The PG is still incomplete. I'll be increasing 
the logging levels to max today to see what pops up.
________________________________________
From: Dan van der Ster [d...@vanderster.com]
Sent: 12 May 2016 09:26
To: Vasilakakos, George (STFC,RAL,SC)
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Incomplete PGs, how do I get them back without data 
loss?

On Wed, May 11, 2016 at 6:53 PM,  <george.vasilaka...@stfc.ac.uk> wrote:
> Hey Dan,
>
> This is on Hammer 0.94.5. osd.52 was always on a problematic machine and when 
> this happened had less data on its local disk than the other OSDs. I've tried 
> adapting that blog post's solution to this situation to no avail.


Do you have a log of what you did and why it didn't work? I guess the
solution to your issue lies in a version of that procedure.

-- dan




>
> I've tried things like looking at all probing OSDs in the query output and 
> importing the data from one copy to all of them to get it to be consistent. 
> One of the major red flags here was that when I looked at the original acting 
> set's disks I found each OSD had a different amount of data for the same PG, 
> there is at least one PG here where 52 (the primary for all four) actually 
> had about 1GB (~27%) less data, everything has just been really inconsistent.
>
> Here's hoping Cunningham will come to the rescue.
>
> Cheers,
>
> George
>
> ________________________________________
> From: Dan van der Ster [d...@vanderster.com]
> Sent: 11 May 2016 17:28
> To: Vasilakakos, George (STFC,RAL,SC)
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Incomplete PGs, how do I get them back without data 
> loss?
>
> Hi George,
>
> Which version of Ceph is this?
> I've never had incompete pgs stuck like this before. AFAIK it means
> that osd.52 would need to be brought up before you can restore those
> PGs.
>
> Perhaps you'll need ceph-objectstore-tool to help dump osd.52 and
> bring up its data elsewhere. A quick check on this list pointed to
> https://ceph.com/community/incomplete-pgs-oh-my/ -- did you try that?
>
> Or perhaps I'm spewing enough nonsense here that Cunningham's Law will
> bring you the solution.
>
> Cheers, Dan
>
>
>
> On Thu, May 5, 2016 at 8:21 PM,  <george.vasilaka...@stfc.ac.uk> wrote:
>> Hi folks,
>>
>> I've got a serious issue with a Ceph cluster that's used for RBD.
>>
>> There are 4 PGs stuck in an incomplete state and I'm trying to repair this 
>> problem to no avail.
>>
>> Here's ceph status:
>> health HEALTH_WARN
>>             4 pgs incomplete
>>             4 pgs stuck inactive
>>             4 pgs stuck unclean
>>             100 requests are blocked > 32 sec
>>      monmap e13: 3 mons at ...
>>             election epoch 2084, quorum 0,1,2 mon4,mon5,mon3
>>      osdmap e154083: 203 osds: 197 up, 197 in
>>       pgmap v37369382: 9856 pgs, 5 pools, 20932 GB data, 22321 kobjects
>>             64871 GB used, 653 TB / 716 TB avail
>>                 9851 active+clean
>>                    4 incomplete
>>                    1 active+clean+scrubbing
>>
>> The 4 PGs all have the same primary OSD, which is on a host that had its 
>> OSDs turned off as it was quite flaky.
>>
>> 1.1bdb    incomplete    [52,100,130]    52    [52,100,130]    52
>> 1.5c2    incomplete    [52,191,109]    52    [52,191,109]    52
>> 1.f98    incomplete    [52,92,37]    52    [52,92,37]    52
>> 1.11dc    incomplete    [52,176,12]    52    [52,176,12]    52
>>
>> One thing that strikes me as odd is that once osd.52 is taken out, these 
>> sets change completely.
>> The situation currently is that, for each of these PGs, the three OSDs have 
>> different amounts of data.
>> They all have similar but different amounts, with osd.52 having the smallest 
>> amount (not by too much though) in each case.
>>
>> Querying those PGs doesn't return a response after a few minutes, manually 
>> triggering scrubs or repairs on them does nothing.
>> I've lowered the min_size from 2 to 1 but I'm not seeing any activity to fix 
>> this.
>>
>> Is there something that can be done to recover without losing that data (it 
>> means each VM has a 75% chance of being destroyed)?
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to