Looks like I’ve now got a consistent repro scenario, please find the gory 
details here http://tracker.ceph.com/issues/20380


On 20/06/17, 2:04 PM, "Pavan Rallabhandi" <prallabha...@walmartlabs.com> wrote:

    Hi Orit,
    No, we do not use multi-site.
    From: Orit Wasserman <owass...@redhat.com>
    Date: Tuesday, 20 June 2017 at 12:49 PM
    To: Pavan Rallabhandi <prallabha...@walmartlabs.com>
    Cc: "ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com>
    Subject: EXT: Re: [ceph-users] FW: radosgw: stale/leaked bucket index 
    Hi Pavan, 
    On Tue, Jun 20, 2017 at 8:29 AM, Pavan Rallabhandi 
<prallabha...@walmartlabs.com> wrote:
    Trying one more time with ceph-users
    On 19/06/17, 11:07 PM, "Pavan Rallabhandi" <prallabha...@walmartlabs.com> 
        On many of our clusters running Jewel (10.2.5+), am running into a 
strange problem of having stale bucket index entries left over for (some of 
the) objects deleted. Though it is not reproducible at will, it has been pretty 
consistent of late and am clueless at this point for the possible reasons to 
happen so.
        The symptoms are that the actual delete operation of an object is 
reported successful in the RGW logs, but a bucket list on the container would 
still show the deleted object. An attempt to download/stat of the object 
appropriately results in a failure. No failures are seen in the respective OSDs 
where the bucket index object is located. And rebuilding the bucket index by 
running ‘radosgw-admin bucket check –fix’ would fix the issue.
        Though I could simulate the problem by instrumenting the code, to not 
to have invoked `complete_del` on the bucket index op 
https://github.com/ceph/ceph/blob/master/src/rgw/rgw_rados.cc#L8793, but that 
call is always seem to be made unless there is a cascading error from the 
actual delete operation of the object, which doesn’t seem to be the case here.
        I wanted to know the possible reasons where the bucket index would be 
left in such limbo, any pointers would be much appreciated. FWIW, we are not 
sharding the buckets and very recently I’ve seen this happen with buckets 
having as low as
        < 10 objects, and we are using swift for all the operations.
    Do you use multisite? 
    ceph-users mailing list

ceph-users mailing list

Reply via email to