The main problem with efficiently listing many-sharded buckets is the requirement to provide entries in sorted order. This means that each http request has to fetch ~1000 entries from every shard, combine them into a sorted order, and throw out the leftovers. The next request to continue the listing will advance its position slightly, but still end up fetching many of the same entries from each shard. As the number of shards increases, the more these shard listings will overlap, and the performance falls off.

Eric Ivancich recently added s3 and swift extensions for unordered bucket listing in https://github.com/ceph/ceph/pull/21026 (for mimic). That allows radosgw to list each shard separately, and avoid the step that throws away extra entries. If your application can tolerate unsorted listings, that could be a big help without having to resort to indexless buckets.


On 05/01/2018 11:09 AM, Robert Stanford wrote:

 I second the indexless bucket suggestion.  The downside being that you can't use bucket policies like object expiration in that case.

On Tue, May 1, 2018 at 10:02 AM, David Turner <drakonst...@gmail.com <mailto:drakonst...@gmail.com>> wrote:

    Any time using shared storage like S3 or cephfs/nfs/gluster/etc
    the absolute rule that I refuse to break is to never rely on a
    directory listing to know where objects/files are.  You should be
    maintaining a database of some sort or a deterministic naming
    scheme. The only time a full listing of a directory should be
    required is if you feel like your tooling is orphaning files and
    you want to clean them up.  If I had someone with a bucket with 2B
    objects, I would force them to use an index-less bucket.

    That's me, though.  I'm sure there are ways to manage a bucket in
    other ways, but it sounds awful.

    On Tue, May 1, 2018 at 10:10 AM Robert Stanford
    <rstanford8...@gmail.com <mailto:rstanford8...@gmail.com>> wrote:


         Listing will always take forever when using a high shard
        number, AFAIK.  That's the tradeoff for sharding.  Are those
        2B objects in one bucket? How's your read and write
        performance compared to a bucket with a lower number
        (thousands) of objects, with that shard number?

        On Tue, May 1, 2018 at 7:59 AM, Katie Holly <8ld3j...@meo.ws
        <mailto:8ld3j...@meo.ws>> wrote:

            One of our radosgw buckets has grown a lot in size, `rgw
            bucket stats --bucket $bucketname` reports a total of
            2,110,269,538 objects with the bucket index sharded across
            32768 shards, listing the root context of the bucket with
            `s3 ls s3://$bucketname` takes more than an hour which is
            the hard limit to first-byte on our nginx reverse proxy
            and the aws-cli times out long before that timeout limit
            is hit.

            The software we use supports sharding the data across
            multiple s3 buckets but before I go ahead and enable this,
            has anyone ever had that many objects in a single RGW
            bucket and can let me know how you solved the problem of
            RGW taking a long time to read the full index?

-- Best regards

            Katie Holly
            _______________________________________________
            ceph-users mailing list
            ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
            <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>


        _______________________________________________
        ceph-users mailing list
        ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
        http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
        <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>




_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to