Re: [ceph-users] Fixing a rgw bucket index

Erdem Agaoglu Mon, 08 Apr 2013 10:35:26 -0700

This is the log grepped with the relevant threadid. It shows 400 in the
last lines but nothing seems odd besides that.
http://pastebin.com/xWCYmnXV


Thanks for your interest.


On Mon, Apr 8, 2013 at 8:21 PM, Yehuda Sadeh <yeh...@inktank.com> wrote:

> Each bucket has a unique prefix which you can get by doing radosgw-admin
> bucket stats on that bucket. You can grep that prefix in 'rados ls -p
> .rgw.buckets'.
>
> Do you have any rgw log showing why you get the Invalid Request response?
> Can you also add 'debug ms = 1' for the log?
>
> Thanks
>
>
> On Mon, Apr 8, 2013 at 10:12 AM, Erdem Agaoglu <erdem.agao...@gmail.com>wrote:
>
>> Just tried that file:
>>
>> $ s3cmd mv s3://imgiz/data/avatars/492/492923.jpg
>> s3://imgiz/data/avatars/492/492923.jpg
>> ERROR: S3 error: 400 (InvalidRequest)
>>
>> a more verbose output shows that the sign-headers was
>> 'PUT\n\n\n\nx-amz-copy-source:/imgiz/data/avatars/492/492923.jpg\nx-amz-date:Mon,
>> 08 Apr 2013 16:59:30
>> +0000\nx-amz-metadata-directive:COPY\n/imgiz/data/avatars/492/492923.jpg'
>>
>> But i guess it doesn't work even if the index is correct. I get the same
>> response on a clear bucket too.
>>
>> We might try that but we don't have a file list. I guess its possible
>> with 'rados ls | grep | sed' ?
>>
>>
>> On Mon, Apr 8, 2013 at 7:53 PM, Yehuda Sadeh <yeh...@inktank.com> wrote:
>>
>>> Can you try copying one of these objects to itself? Would that work
>>> and/or change the index entry? Another option would be to try copying all
>>> the objects to a different bucket.
>>>
>>>
>>> On Mon, Apr 8, 2013 at 9:48 AM, Erdem Agaoglu 
>>> <erdem.agao...@gmail.com>wrote:
>>>
>>>> omap header and all other omap attributes was destroyed. I copied
>>>> another index over the destroyed one to get a somewhat valid header and it
>>>> seems intact. After a 'check --fix':
>>>>
>>>>  # rados -p .rgw.buckets getomapheader .dir.4470.1
>>>> header (49 bytes) :
>>>> 0000 : 03 02 2b 00 00 00 01 00 00 00 01 02 02 18 00 00 :
>>>> ..+.............
>>>> 0010 : 00 7d 7a 3f 6e 01 00 00 00 00 d0 00 7e 01 00 00 :
>>>> .}z?n.......~...
>>>> 0020 : 00 bb f5 01 00 00 00 00 00 00 00 00 00 00 00 00 :
>>>> ................
>>>> 0030 : 00                                              : .
>>>>
>>>>
>>>> Rados shows objects are there:
>>>>
>>>> # rados ls -p .rgw.buckets |grep 4470.1_data/avatars
>>>> 4470.1_data/avatars/11047/11047823_20101211154308.jpg
>>>> 4470.1_data/avatars/106/106976-orig
>>>> 4470.1_data/avatars/492/492923.jpg
>>>> 4470.1_data/avatars/275/275479.jpg
>>>> ...
>>>>
>>>>
>>>> And i am able to GET them
>>>>
>>>> $ s3cmd get s3://imgiz/data/avatars/492/492923.jpg
>>>> s3://imgiz/data/avatars/492/492923.jpg -> ./492923.jpg  [1 of 1]
>>>>  3587 of 3587   100% in    0s    93.40 kB/s  done
>>>>
>>>>
>>>> But unable to list them
>>>>
>>>> $ s3cmd ls s3://imgiz/data/avatars
>>>> <NOTHING>
>>>>
>>>>
>>>> My initial expectation was that 'bucket check --fix --check-objects'
>>>> will actually read the files like 'rados ls' does and would recreate the
>>>> missing omapkeys but it doesn't seem to do that. Now a simple check says
>>>>
>>>> # radosgw-admin bucket check -b imgiz
>>>> { "existing_header": { "usage": { "rgw.main": { "size_kb": 6000607,
>>>>               "size_kb_actual": 6258740,
>>>>               "num_objects": 128443}}},
>>>>   "calculated_header": { "usage": { "rgw.main": { "size_kb": 6000607,
>>>>               "size_kb_actual": 6258740,
>>>>               "num_objects": 128443}}}}
>>>>
>>>> But i know we have more than 128k objects.
>>>>
>>>>
>>>>
>>>> On Mon, Apr 8, 2013 at 7:17 PM, Yehuda Sadeh <yeh...@inktank.com>wrote:
>>>>
>>>>> We'll need to have more info about the current state. Was just the
>>>>> omap header destroyed, or does it still exist? What does the header
>>>>> contain now? Are you able to actually access objects in that bucket,
>>>>> but just fail to list them?
>>>>>
>>>>> On Mon, Apr 8, 2013 at 8:34 AM, Erdem Agaoglu <erdem.agao...@gmail.com>
>>>>> wrote:
>>>>> > Hi again,
>>>>> >
>>>>> > I managed to change the file with some other bucket's index.
>>>>> --check-objects
>>>>> > --fix worked but my hopes have failed as it didn't actually read
>>>>> through the
>>>>> > files or fixed anything. Any suggestions?
>>>>> >
>>>>> >
>>>>> > On Thu, Apr 4, 2013 at 5:56 PM, Erdem Agaoglu <
>>>>> erdem.agao...@gmail.com>
>>>>> > wrote:
>>>>> >>
>>>>> >> Hi all,
>>>>> >>
>>>>> >> After a major failure, and getting our cluster health back OK (with
>>>>> some
>>>>> >> help from inktank folks, thanks), we found out that we have managed
>>>>> to
>>>>> >> corrupt one of our bucket indices. As far as i can track it, we are
>>>>> missing
>>>>> >> the omapheader on that specific index, so we're unable to use
>>>>> radosgw-admin
>>>>> >> tools to fix it.
>>>>> >>
>>>>> >> While a healthy (smaller) bucket answers
>>>>> >> # radosgw-admin bucket check -b imgdoviz
>>>>> >> { "existing_header": { "usage": { "rgw.main": { "size_kb": 4140,
>>>>> >>               "size_kb_actual": 4484,
>>>>> >>               "num_objects": 157}}},
>>>>> >>   "calculated_header": { "usage": { "rgw.main": { "size_kb": 4140,
>>>>> >>               "size_kb_actual": 4484,
>>>>> >>               "num_objects": 157}}}}
>>>>> >>
>>>>> >> The faulty one fails with
>>>>> >> # radosgw-admin bucket check -b imgiz
>>>>> >> failed to list objects in bucket=imgiz(@.rgw.buckets[4470.1])
>>>>> err=(22)
>>>>> >> Invalid argument
>>>>> >> failed to check index err=(22) Invalid argument
>>>>> >>
>>>>> >> When i push further
>>>>> >> # radosgw-admin bucket check -b imgiz --check-objects --fix
>>>>> >> failed to list objects in bucket=imgiz(@.rgw.buckets[4470.1])
>>>>> err=(22)
>>>>> >> Invalid argument
>>>>> >> Checking objects, decreasing bucket 2-phase commit timeout.
>>>>> >> ** Note that timeout will reset only when operation completes
>>>>> successfully
>>>>> >> **
>>>>> >> ERROR: failed operation r=-22
>>>>> >> ERROR: failed operation r=-22
>>>>> >> ..
>>>>> >> last line keeps repeating without any progress.
>>>>> >>
>>>>> >> I checked the file omapheaders and while the healty bucket has:
>>>>> >> # rados -p .rgw.buckets getomapheader .dir.6912.3
>>>>> >> header (49 bytes) :
>>>>> >> 0000 : 03 02 2b 00 00 00 01 00 00 00 01 02 02 18 00 00 :
>>>>> ..+.............
>>>>> >> 0010 : 00 a8 af 40 00 00 00 00 00 00 10 46 00 00 00 00 :
>>>>> ...@.......F....
>>>>> >> 0020 : 00 9d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 :
>>>>> ................
>>>>> >> 0030 : 00                                              : .
>>>>> >>
>>>>> >> the faulty one is missing it
>>>>> >> # rados -p .rgw.buckets getomapheader .dir.4470.1
>>>>> >> header (0 bytes) :
>>>>> >>
>>>>> >>
>>>>> >> I'm currently in the process of understanding how to create a
>>>>> readable
>>>>> >> header. My hopes are even while its wrong, radosgw-admin will be
>>>>> able to
>>>>> >> read through objects and fix the necessary parts. But i'm not sure
>>>>> how to
>>>>> >> set the new-header. It seems there is a setomapheader counterpart
>>>>> for
>>>>> >> getomapheader but it only accepts values from commandline so i
>>>>> don't know
>>>>> >> how to push a rgw-readable binary with it.
>>>>> >>
>>>>> >> Is this somewhat possible?
>>>>> >>
>>>>> >> Thanks in advance.
>>>>> >>
>>>>> >> --
>>>>> >> erdem agaoglu
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > erdem agaoglu
>>>>> >
>>>>> > _______________________________________________
>>>>> > ceph-users mailing list
>>>>> > ceph-users@lists.ceph.com
>>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>> >
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> erdem agaoglu
>>>>
>>>
>>>
>>
>>
>> --
>> erdem agaoglu
>>
>
>


-- 
erdem agaoglu

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Fixing a rgw bucket index

Reply via email to