On Wednesday, September 10, 2014, Daniel Schneller <
daniel.schnel...@centerdevice.com> wrote:

> On 09 Sep 2014, at 21:43, Gregory Farnum <g...@inktank.com
> <javascript:_e(%7B%7D,'cvml','g...@inktank.com');>> wrote:
> Yehuda can talk about this with more expertise than I can, but I think
> it should be basically fine. By creating so many buckets you're
> decreasing the effectiveness of RGW's metadata caching, which means
> the initial lookup in a particular bucket might take longer.
> Thanks for your thoughts. With “initial lookup in a particular bucket”
> do you mean accessing any of the objects in a bucket? If we directly
> access the object (not enumerating the buckets content), would that
> still be an issue?
> Just trying to understand the inner workings a bit better to make
> more educated guesses :)

When doing an object lookup, the gateway combines the "bucket ID" with a
mangled version of the object name to try and do a read out of RADOS. It
first needs to get that bucket ID though -- it will cache an the bucket
name->ID mapping, but if you have a ton of buckets there could be enough
entries to degrade the cache's effectiveness. (So, you're more likely to
pay that extra disk access lookup.)

> The big concern is that we do maintain a per-user list of all their
> buckets — which is stored in a single RADOS object — so if you have an
> extreme number of buckets that RADOS object could get pretty big and
> become a bottleneck when creating/removing/listing the buckets. You
> Alright. Listing buckets is no problem, that we don’t do. Can you
> say what “pretty big” would be in terms of MB? How much space does a
> bucket record consume in there? Based on that I could run a few numbers.

Uh, a kilobyte per bucket? You could look it up in the source (I'm on my
phone) but I *believe* the bucket name is allowed to be larger than the
rest combined...
More particularly, though, if you've got a single user uploading documents,
each creating a new bucket, then those bucket creates are going to
serialize on this one object.

> should run your own experiments to figure out what the limits are
> there; perhaps you have an easy way of sharding up documents into
> different users.
> Good advice. We can do that per distributor (an org unit in our
> software) to at least compartmentalize any potential locking issues
> in this area to that single entity. Still, there would be quite
> a lot of buckets/objects per distributor, so some more detail on
> the above items would be great.
> Thanks a lot!
> Daniel

Software Engineer #42 @ http://inktank.com | http://ceph.com
ceph-users mailing list

Reply via email to