Hi,

My second attempt to get help with a problem I'm trying to solve for about 6 
month now.

I have a Ceph 16.2.6 test cluster, used almost exclusively for providing RGW/S3 
service. similar to a production cluster.

The problem I have is this:
A client uploads (via S3) a bunch of large files into a bucket via multiparts
The upload(s) get interrupted and retried
In the end from a client's perspective all the files are visible and everything 
looks fine.
But on the cluster there are many more objects in the buckets
Even after cleaning out the incomplete multipart uploads there are too many 
objects
Even after deleting all the visible objects from the bucket there are still 
objects in the bucket
I have so far found no way to get rid of those left-over objects.
It's screwing up space accounting and I'm afraid I'll eventually have a cluster 
full of those lost objects.
The only way to clean up seems to be to copy te contents of a bucket to a new 
bucket and delete the screwed-up bucket. But on a production system that's not 
always a real option.

I've found a variety of older threads that describe a similar problem. None of 
them decribing a solution :(



I can pretty easily reproduce the problem with this sequence:

On a client system create a directory with ~30 200MB files. (On a faster system 
I'd probably need bigger or more files)
tstfiles/tst01 - tst29

run 
$ rclone mkdir tester:/test-bucket # creates a bucket on the test system with 
user tester
Run
$ rclone sync -v tstfiles tester:/test-bucket/tstfiles
a couple of times (6-8), interrupting each one via CNTRL-C
Eventually let one finish.

Now I can use s3cmd to see all the files:
$ s3cmd ls -lr s3://test-bucket/tstfiles
2022-03-16 17:11   200M  ecb28853bd18eeae185b0b12bd47333c-40  STANDARD     
s3://test-bucket/tstfiles/tst01
...
2022-03-16 17:13   200M  ecb28853bd18eeae185b0b12bd47333c-40  STANDARD     
s3://test-bucket/tstfiles/tst29

... and to list incomplete uploads:
$ s3cmd multipart s3://test-bucket
s3://test-bucket/
Initiated       Path    Id
2022-03-16T17:11:19.074Z        s3://test-bucket/tstfiles/tst05 
2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g
...
2022-03-16T17:12:41.583Z        s3://test-bucket/tstfiles/tst28 
2~exVQUILhVSmFqWxCuAflRa4Tfq4nUQa

I can abort the uploads with
$  s3cmd abortmp s3://test-bucket/tstfiles/tst05 
2~1nElF0c3uq5FnZ9cKlsnGlXKATvjr0g
...

Afterwards  s3cmd multipart s3://test-bucket doesn't list anything.

Now I delete all the objects in the bucket 
$ s3 rm -r s3://test-bucket/tstfiles

.. and as client I think everything is cleaned up. No data stored to pay for.


Then I go to the Ceph cluster
In the dashboard I see 80MB in 16 objects in the bucket test-bucket

# radosgw-admin bi list --bucket test-bucket | grep idx
        "idx": "_multipart_tstfiles/tst11.2~mki7cgVbg1dAh22EMZKJgmlUMveKBjx.4",
        "idx": "_multipart_tstfiles/tst11.2~mki7cgVbg1dAh22EMZKJgmlUMveKBjx.5",
        "idx": "_multipart_tstfiles/tst11.2~mki7cgVbg1dAh22EMZKJgmlUMveKBjx.6",
        "idx": "_multipart_tstfiles/tst11.2~mki7cgVbg1dAh22EMZKJgmlUMveKBjx.7",
        "idx": "_multipart_tstfiles/tst10.2~QlUi7pxm5KnQZEzPh-osjXGRe0jI-hS.3",
        "idx": "_multipart_tstfiles/tst10.2~QlUi7pxm5KnQZEzPh-osjXGRe0jI-hS.5",
        "idx": "_multipart_tstfiles/tst10.2~QlUi7pxm5KnQZEzPh-osjXGRe0jI-hS.6",
        "idx": "_multipart_tstfiles/tst05.2~5l2zkUs__QvrD7pMevtFFhia47pdHLO.5",
        "idx": "_multipart_tstfiles/tst05.2~5l2zkUs__QvrD7pMevtFFhia47pdHLO.6",
        "idx": "_multipart_tstfiles/tst05.2~5l2zkUs__QvrD7pMevtFFhia47pdHLO.7",
        "idx": "_multipart_tstfiles/tst05.2~5l2zkUs__QvrD7pMevtFFhia47pdHLO.8",
        "idx": "_multipart_tstfiles/tst29.2~7ZrDo14zBvVBSASif1c8KmqXdwyzvzn.14",
        "idx": "_multipart_tstfiles/tst29.2~7ZrDo14zBvVBSASif1c8KmqXdwyzvzn.15",
        "idx": "_multipart_tstfiles/tst29.2~7ZrDo14zBvVBSASif1c8KmqXdwyzvzn.17",
        "idx": "_multipart_tstfiles/tst29.2~7ZrDo14zBvVBSASif1c8KmqXdwyzvzn.19",
        "idx": "_multipart_tstfiles/tst18.2~VaXq_AAM1JUb4Gtuw7zfYiKpBaYJSbs.1",
# radosgw-admin bucket stats --bucket test-bucket
{
    "bucket": "test-bucket",
    "num_shards": 30,
    "tenant": "",
    "zonegroup": "c44f5696-8f78-4c74-b657-15f611de1969",
    "placement_rule": "ec21",
    "explicit_placement": {
        "data_pool": "",
        "data_extra_pool": "",
        "index_pool": ""
    },
    "id": "3d925f45-51bd-4442-817d-acc4d77f6ba3.1925174.1",
    "marker": "3d925f45-51bd-4442-817d-acc4d77f6ba3.1925174.1",
    "index_type": "Normal",
    "owner": "tester",
    "ver": 
"0#538,1#785,2#360,3#352,4#213,5#4,6#174,7#4,8#4,9#186,10#246,11#244,12#182,13#4,14#4,15#4,16#354,17#4,18#4,19#172,20#183,21#378,22#211,23#172,24#4,25#4,26#4,27#208,28#224,29#363",
    "master_ver": 
"0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0,11#0,12#0,13#0,14#0,15#0,16#0,17#0,18#0,19#0,20#0,21#0,22#0,23#0,24#0,25#0,26#0,27#0,28#0,29#0",
    "mtime": "2022-03-16T17:02:38.754412Z",
    "creation_time": "2022-03-16T17:02:38.715908Z",
    "max_marker": 
"0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#,11#,12#,13#,14#,15#,16#,17#,18#,19#,20#,21#,22#,23#,24#,25#,26#,27#,28#,29#",
    "usage": {
        "rgw.main": {
            "size": 83886080,
            "size_actual": 83886080,
            "size_utilized": 83886080,
            "size_kb": 81920,
            "size_kb_actual": 81920,
            "size_kb_utilized": 81920,
            "num_objects": 16
        }
    },
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    }
}
# radosgw-admin bucket check --fix --check-objects --bucket test-bucket
[
    "_multipart_tstfiles/tst05.2~5l2zkUs__QvrD7pMevtFFhia47pdHLO.5",
    "_multipart_tstfiles/tst05.2~5l2zkUs__QvrD7pMevtFFhia47pdHLO.6",
    "_multipart_tstfiles/tst05.2~5l2zkUs__QvrD7pMevtFFhia47pdHLO.7",
    "_multipart_tstfiles/tst05.2~5l2zkUs__QvrD7pMevtFFhia47pdHLO.8",
    "_multipart_tstfiles/tst10.2~QlUi7pxm5KnQZEzPh-osjXGRe0jI-hS.3",
    "_multipart_tstfiles/tst10.2~QlUi7pxm5KnQZEzPh-osjXGRe0jI-hS.5",
    "_multipart_tstfiles/tst10.2~QlUi7pxm5KnQZEzPh-osjXGRe0jI-hS.6",
    "_multipart_tstfiles/tst11.2~mki7cgVbg1dAh22EMZKJgmlUMveKBjx.4",
    "_multipart_tstfiles/tst11.2~mki7cgVbg1dAh22EMZKJgmlUMveKBjx.5",
    "_multipart_tstfiles/tst11.2~mki7cgVbg1dAh22EMZKJgmlUMveKBjx.6",
    "_multipart_tstfiles/tst11.2~mki7cgVbg1dAh22EMZKJgmlUMveKBjx.7",
    "_multipart_tstfiles/tst18.2~VaXq_AAM1JUb4Gtuw7zfYiKpBaYJSbs.1",
    "_multipart_tstfiles/tst29.2~7ZrDo14zBvVBSASif1c8KmqXdwyzvzn.14",
    "_multipart_tstfiles/tst29.2~7ZrDo14zBvVBSASif1c8KmqXdwyzvzn.15",
    "_multipart_tstfiles/tst29.2~7ZrDo14zBvVBSASif1c8KmqXdwyzvzn.17",
    "_multipart_tstfiles/tst29.2~7ZrDo14zBvVBSASif1c8KmqXdwyzvzn.19"
]
{
    "object": "_multipart_tstfiles/tst05.2~5l2zkUs__QvrD7pMevtFFhia47pdHLO.5",
    "object": "_multipart_tstfiles/tst05.2~5l2zkUs__QvrD7pMevtFFhia47pdHLO.6",
    "object": "_multipart_tstfiles/tst05.2~5l2zkUs__QvrD7pMevtFFhia47pdHLO.7",
    "object": "_multipart_tstfiles/tst05.2~5l2zkUs__QvrD7pMevtFFhia47pdHLO.8",
    "object": "_multipart_tstfiles/tst10.2~QlUi7pxm5KnQZEzPh-osjXGRe0jI-hS.3",
    "object": "_multipart_tstfiles/tst10.2~QlUi7pxm5KnQZEzPh-osjXGRe0jI-hS.5",
    "object": "_multipart_tstfiles/tst10.2~QlUi7pxm5KnQZEzPh-osjXGRe0jI-hS.6",
    "object": "_multipart_tstfiles/tst11.2~mki7cgVbg1dAh22EMZKJgmlUMveKBjx.4",
    "object": "_multipart_tstfiles/tst11.2~mki7cgVbg1dAh22EMZKJgmlUMveKBjx.5",
    "object": "_multipart_tstfiles/tst11.2~mki7cgVbg1dAh22EMZKJgmlUMveKBjx.6",
    "object": "_multipart_tstfiles/tst11.2~mki7cgVbg1dAh22EMZKJgmlUMveKBjx.7",
    "object": "_multipart_tstfiles/tst18.2~VaXq_AAM1JUb4Gtuw7zfYiKpBaYJSbs.1",
    "object": "_multipart_tstfiles/tst29.2~7ZrDo14zBvVBSASif1c8KmqXdwyzvzn.14",
    "object": "_multipart_tstfiles/tst29.2~7ZrDo14zBvVBSASif1c8KmqXdwyzvzn.15",
    "object": "_multipart_tstfiles/tst29.2~7ZrDo14zBvVBSASif1c8KmqXdwyzvzn.17",
    "object": "_multipart_tstfiles/tst29.2~7ZrDo14zBvVBSASif1c8KmqXdwyzvzn.19"
}
{
    "existing_header": {
        "usage": {
            "rgw.main": {
                "size": 83886080,
                "size_actual": 83886080,
                "size_utilized": 83886080,
                "size_kb": 81920,
                "size_kb_actual": 81920,
                "size_kb_utilized": 81920,
                "num_objects": 16
            },
            "rgw.multimeta": {
                "size": 0,
                "size_actual": 0,
                "size_utilized": 0,
                "size_kb": 0,
                "size_kb_actual": 0,
                "size_kb_utilized": 0,
                "num_objects": 0
            }
        }
    },
    "calculated_header": {
        "usage": {
            "rgw.main": {
                "size": 83886080,
                "size_actual": 83886080,
                "size_utilized": 83886080,
                "size_kb": 81920,
                "size_kb_actual": 81920,
                "size_kb_utilized": 81920,
                "num_objects": 16
            }
        }
    }
}
Running it again only removes the "rgw.multimeta" part. The objects are still 
there, taking up space

So, there's obviously a bug that makes those multipart pieces "invsible".
Is there any way to get rid of them without damaging the whole RGW installation?
I've tried a bunch of things, some do nothing (e.g. lifecycle policies for 
multipart uploads - works fine only on the visible multiparts) and after a few 
uninformed radios attempts had to recreate the RGW installation a couple of 
times.

Would really appreciate any help.

Ciao, Uli
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to