Re: [ceph-users] max_bucket limit -- safe to disable?

Daniel Schneller Tue, 07 Oct 2014 22:19:46 -0700

Hi!

> By looking at these logs it seems that there are only 8 pgs on the
> .rgw pool, if this is correct then you may want to change that
> considering your workload.



Thanks. See out pg_num configuration below. We had already suspected
that the 1600 that we had previously (48 OSDs * 100 / triple redundancy)
were not ideal, so we increased the .rgw.buckets pool to 2048.

The number of objects and their size was in an earlier email, but for
completeness I will put them up once again. 

Any other ideas where to look?

==============
for i in $(rados df | awk '{ print $1 }' | grep '^\.'); do
   echo $i; echo -n " - “; 
   ceph osd pool get $i pg_num; 
   echo -n " - “; 
   ceph osd pool get $i pgp_num;
done

.intent-log
 - pg_num: 1600
 - pgp_num: 1600
.log
 - pg_num: 1600
 - pgp_num: 1600
.rgw
 - pg_num: 1600
 - pgp_num: 1600
.rgw.buckets
 - pg_num: 2048
 - pgp_num: 2048
.rgw.buckets.index
 - pg_num: 1600
 - pgp_num: 1600
.rgw.control
 - pg_num: 1600
 - pgp_num: 1600
.rgw.gc
 - pg_num: 1600
 - pgp_num: 1600
.rgw.root
 - pg_num: 100
 - pgp_num: 100
.usage
 - pg_num: 1600
 - pgp_num: 1600
.users
 - pg_num: 1600
 - pgp_num: 1600
.users.email
 - pg_num: 1600
 - pgp_num: 1600
.users.swift
 - pg_num: 1600
 - pgp_num: 1600
.users.uid
 - pg_num: 1600
 - pgp_num: 1600
===============


> .rgw
> =========================
>     KB:     1,966,932
> objects:    9,094,552
>     rd:   195,747,645
>  rd KB:   153,585,472
>     wr:    30,191,844
>  wr KB:    10,751,065
> 
> .rgw.buckets
> =========================
>     KB: 2,038,313,855
> objects:   22,088,103
>     rd:     5,455,123
>  rd KB:   408,416,317
>     wr:   149,377,728
>  wr KB: 1,882,517,472
> 
> .rgw.buckets.index
> =========================
>     KB:             0
> objects:    5,374,376
>     rd:   267,996,778
>  rd KB:   262,626,106
>     wr:   107,142,891
>  wr KB:             0
> 
> .rgw.control
> =========================
>     KB:             0
> objects:            8
>     rd:             0
>  rd KB:             0
>     wr:             0
>  wr KB:             0
> 
> .rgw.gc
> =========================
>     KB:             0
> objects:           32
>     rd:     5,554,407
>  rd KB:     5,713,942
>     wr:     8,355,934
>  wr KB:             0
> 
> .rgw.root
> =========================
>     KB:             1
> objects:            3
>     rd:           524
>  rd KB:           346
>     wr:             3
>  wr KB:             3


Daniel

> On 08 Oct 2014, at 01:03, Yehuda Sadeh <yeh...@redhat.com> wrote:
> 
> This operation stalled quite a bit, seems that it was waiting for the osd:
> 
> 2.547155 7f036ffc7700  1 -- 10.102.4.11:0/1009401 -->
> 10.102.4.14:6809/7428 -- osd_op(client.78418684.0:27514711
> .bucket.meta.<CNT-UUID-FINDME>:default.78418684.122043 [call
> version.read,getxattrs,stat] 5.3b7d1197 ack+read e16034) v4 -- ?+0
> 0x7f026802e2c0 con 0x7f040c055ca0
> ...
> 7.619750 7f041ddf4700  1 -- 10.102.4.11:0/1009401 <== osd.32
> 10.102.4.14:6809/7428 208252 ==== osd_op_reply(27514711
> .bucket.meta.<CNT-UUID-FINDME>:default.78418684.122043
> [call,getxattrs,stat] v0'0 uv6371 ondisk = 0) v6 ==== 338+0+336
> (3685145659 0 4232894755) 0x7f00e430f540 con 0x7f040c055ca0
> 
> By looking at these logs it seems that there are only 8 pgs on the
> .rgw pool, if this is correct then you may want to change that
> considering your workload.
> 
> Yehuda
> 
> 
> On Tue, Oct 7, 2014 at 3:46 PM, Daniel Schneller
> <daniel.schnel...@centerdevice.com> wrote:
>> Hi!
>> 
>> Sorry, I must have missed the enabling of that debug module.
>> However, the test setup has been the same all the time -
>> I only have the one test-application :)
>> 
>> But maybe I phrased it a bit ambiguously when I wrote
>> 
>>> It then continuously created containers - both empty
>>> and such with 10 objects of around 100k random data in them.
>> 
>> 100 kilobytes is the size of a single object, of which we create 10
>> per container. The container gets created first, without any
>> objects, naturally, then 10 objects are added. One of these objects
>> is called “version”, the rest have generated names with a fixed
>> prefix and appended 1-9. The version object is the one I picked
>> for the example logs I sent earlier.
>> 
>> I hope this makes the setup clearer.
>> 
>> Attached you will find the (now more extensive) logs for the outliers
>> again. As you did not say that I garbled the logs, I assume the
>> pre-processing was OK, so I have prepared the new data in a similar
>> fashion, marking the relevant request with <CNT-UUID-FINDME>.
>> 
>> I have not removed any lines in between the beginning of the
>> “interesting” request and its completion to keep all the network
>> traffic log intact. Due to the increased verbosity, I will not post
>> the logs inline, but only attach them gzipped.
>> 
>> As before, should the full data set be needed, I can provide
>> an archived version.
>> 
>> 
>> 
>> 
>> Thanks for your support!
>> Daniel
>> 
>> 
>> 
>> 
>>> On 07 Oct 2014, at 22:45, Yehuda Sadeh <yeh...@redhat.com> wrote:
>>> 
>>> The logs here don't include the messenger (debug ms = 1). It's hard to
>>> tell what going on from looking at the outliers. Also, in your
>>> previous mail you described a different benchmark, you tested writing
>>> large number of objects into a single bucket, whereas in this test
>>> you're testing multiple bucket creations, which have a completely
>>> different characteristics.
>>> 
>>> 
>>> On Tue, Oct 7, 2014 at 1:26 PM, Daniel Schneller
>>> <daniel.schnel...@centerdevice.com> wrote:
>>>> Hi!
>>>> 
>>>> I have re-run our test as follows:
>>>> 
>>>> * 4 Rados Gateways, on 4 baremetal machines which have
>>>> a total of 48 spinning rust OSDs.
>>>> 
>>>> * Benchmark run on a virtual machine talking to HAProxy
>>>> which balances the requests across the 4 Rados GWs.
>>>> 
>>>> * Three instances of the benchmark run in parallel. Each
>>>> instance creates 1000 containers, puts 11 objects into
>>>> each container. Once they have all been created, each
>>>> instances deletes its own containers again.
>>>> 
>>>> I configured one of the radosgws with the debug levels
>>>> you requested. The tests produced quite an amount of data
>>>> (approx. 1GB of text), so I took the liberty to
>>>> pre-process that a bit.
>>>> 
>>>> In this run we landed at around 1.2s per container
>>>> created (including the objects in them) on average.
>>>> This was slightly better than the 1.6s we witnessed
>>>> before, but that test ran for much longer, so there might
>>>> have been some queue-up effect.
>>>> 
>>>> Interestingly enough the average is actually somewhat
>>>> misleading. The logs below show the creation of one
>>>> object in a container each, one being the fastest of this
>>>> benchmark (at least on the debug-enabled radosgw), one
>>>> being the slowest.
>>>> 
>>>> The fastest one was completed in 0.013s, the longest one
>>>> took 4.93s(!).
>>>> 
>>>> The attached logs are cleaned up so that they each show
>>>> just a single request and replaced longish, but constant
>>>> information with placeholders. Our container names are
>>>> of the form “stresstest-xxxxxxxxxxx” which I shortened
>>>> to “<CNT-UUID>” for brevity. Also, I removed the redundant
>>>> prefix (date, hour, minute of day).
>>>> 
>>>> The column before the log level looked like a thread-id.
>>>> As I focused on a single request, I removed all the lines
>>>> that did not match the same id, replacing the actual value
>>>> with “<ID>”. That makes the logs much easier to read and
>>>> understand.
>>>> 
>>>> Just in case I might have removed too much information
>>>> for the logs to be useful, the complete log is available
>>>> in BZIP2 compressed form for download. Just let me know
>>>> if you need it, then I will provide a link via direct email.
>>>> 
>>>> To me it seems like there might indeed be a contention
>>>> issue. It would be interesting to know, if this is correct
>>>> and if there are any settings that we could adjust to
>>>> alleviate the issue.
>>>> 
>>>> Daniel
>>>> 
>>>> ==============================
>>>> 
>>>> ➜  ~  cat rados_shortest.txt
>>>> 21.431185 <ID> 20 QUERY_STRING=page=swift&params=/v1/<CNT-UUID>/version
>>>> 21.431187 <ID> 20 REMOTE_ADDR=10.102.8.140
>>>> 21.431188 <ID> 20 REMOTE_PORT=44007
>>>> 21.431189 <ID> 20 REQUEST_METHOD=PUT
>>>> 21.431190 <ID> 20 REQUEST_SCHEME=https
>>>> 21.431191 <ID> 20 REQUEST_URI=/swift/v1/<CNT-UUID>/version
>>>> 21.431192 <ID> 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi
>>>> 21.431193 <ID> 20 SCRIPT_NAME=/swift/v1/<CNT-UUID>/version
>>>> 21.431194 <ID> 20
>>>> SCRIPT_URI=https://localhost:8405/swift/v1/<CNT-UUID>/version
>>>> 21.431195 <ID> 20 SCRIPT_URL=/swift/v1/<CNT-UUID>/version
>>>> 21.431196 <ID> 20 SERVER_ADDR=10.102.9.11
>>>> 21.431197 <ID> 20 SERVER_ADMIN=[no address given]
>>>> 21.431198 <ID> 20 SERVER_NAME=localhost
>>>> 21.431199 <ID> 20 SERVER_PORT=8405
>>>> 21.431200 <ID> 20 SERVER_PORT_SECURE=443
>>>> 21.431201 <ID> 20 SERVER_PROTOCOL=HTTP/1.1
>>>> 21.431202 <ID> 20 SERVER_SIGNATURE=
>>>> 21.431203 <ID> 20 SERVER_SOFTWARE=Apache/2.4.7 (Ubuntu)
>>>> 21.431205 <ID>  1 ====== starting new request req=0x7f038c019e90 =====
>>>> 21.431219 <ID>  2 req 980641:0.000015::PUT
>>>> /swift/v1/<CNT-UUID>/version::initializing
>>>> 21.431259 <ID> 10 ver=v1 first=<CNT-UUID> req=version
>>>> 21.431265 <ID> 10 s->object=version s->bucket=<CNT-UUID>
>>>> 21.431269 <ID>  2 req 980641:0.000065:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version::getting op
>>>> 21.431274 <ID>  2 req 980641:0.000070:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:authorizing
>>>> 21.431321 <ID> 20 get_obj_state: rctx=0x7f030800b720
>>>> obj=.users.swift:documentstore:swift state=0x7f03080f31e8 
>>>> s->prefetch_data=0
>>>> 21.431332 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit
>>>> 21.431338 <ID> 20 get_obj_state: s->obj_tag was set empty
>>>> 21.431344 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit
>>>> 21.431369 <ID> 20 get_obj_state: rctx=0x7f030800b720
>>>> obj=.users.uid:documentstore state=0x7f03080f31e8 s->prefetch_data=0
>>>> 21.431374 <ID> 10 cache get: name=.users.uid+documentstore : hit
>>>> 21.431378 <ID> 20 get_obj_state: s->obj_tag was set empty
>>>> 21.431382 <ID> 10 cache get: name=.users.uid+documentstore : hit
>>>> 21.431401 <ID> 10 swift_user=documentstore:swift
>>>> 21.431416 <ID> 20 build_token
>>>> token=13000000646f63756d656e7473746f72653a737769667406a4b2ba3999f8a84f45355438d8ff17
>>>> 21.431467 <ID>  2 req 980641:0.000262:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:reading permissions
>>>> 21.431493 <ID> 20 get_obj_state: rctx=0x7f03837ed250 obj=.rgw:<CNT-UUID>
>>>> state=0x7f03080f31e8 s->prefetch_data=0
>>>> 21.431508 <ID> 10 cache get: name=.rgw+<CNT-UUID> : type miss 
>>>> (requested=22,
>>>> cached=19)
>>>> 21.433081 <ID> 10 cache put: name=.rgw+<CNT-UUID>
>>>> 21.433106 <ID> 10 removing entry:
>>>> name=.rgw+stresstest-ab9ee3e2-dcf5-4a5b-ab40-931d94c7784038242 from cache
>>>> LRU
>>>> 21.433114 <ID> 10 moving .rgw+<CNT-UUID> to cache LRU end
>>>> 21.433120 <ID> 20 get_obj_state: s->obj_tag was set empty
>>>> 21.433122 <ID> 20 Read xattr: user.rgw.idtag
>>>> 21.433124 <ID> 20 Read xattr: user.rgw.manifest
>>>> 21.433129 <ID> 10 cache get: name=.rgw+<CNT-UUID> : hit
>>>> 21.433141 <ID> 20 rgw_get_bucket_info: bucket instance:
>>>> <CNT-UUID>(@{i=.rgw.buckets.index}.rgw.buckets[default.78418684.118911])
>>>> 21.433148 <ID> 20 reading from
>>>> .rgw:.bucket.meta.<CNT-UUID>:default.78418684.118911
>>>> 21.433169 <ID> 20 get_obj_state: rctx=0x7f03837ed250
>>>> obj=.rgw:.bucket.meta.<CNT-UUID>:default.78418684.118911
>>>> state=0x7f0308005778 s->prefetch_data=0
>>>> 21.433185 <ID> 10 cache get:
>>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 : type miss
>>>> (requested=22, cached=19)
>>>> 21.434632 <ID> 10 cache put:
>>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911
>>>> 21.434650 <ID> 10 moving
>>>> .rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 to cache LRU end
>>>> 21.434657 <ID> 10 updating xattr: name=user.rgw.acl bl.length()=177
>>>> 21.434664 <ID> 20 get_obj_state: s->obj_tag was set empty
>>>> 21.434667 <ID> 20 Read xattr: user.rgw.acl
>>>> 21.434668 <ID> 20 Read xattr: user.rgw.idtag
>>>> 21.434669 <ID> 20 Read xattr: user.rgw.manifest
>>>> 21.434675 <ID> 10 cache get:
>>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 : hit
>>>> 21.434701 <ID> 15 Read AccessControlPolicy<AccessControlPolicy
>>>> xmlns="http://s3.amazonaws.com/doc/2006-03-01/";><Owner><ID>documentstore</ID><DisplayName>Document
>>>> Store</DisplayName></Owner><AccessControlList><Grant><Grantee
>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>>>> xsi:type="CanonicalUser"><ID>documentstore</ID><DisplayName>Document
>>>> Store</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>
>>>> 21.434710 <ID>  2 req 980641:0.003506:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:init op
>>>> 21.434716 <ID>  2 req 980641:0.003512:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op mask
>>>> 21.434719 <ID> 20 required_mask= 2 user.op_mask=7
>>>> 21.434720 <ID>  2 req 980641:0.003516:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op permissions
>>>> 21.434723 <ID>  5 Searching permissions for uid=documentstore mask=50
>>>> 21.434725 <ID>  5 Found permission: 15
>>>> 21.434726 <ID>  5 Searching permissions for group=1 mask=50
>>>> 21.434727 <ID>  5 Permissions for group not found
>>>> 21.434729 <ID>  5 Searching permissions for group=2 mask=50
>>>> 21.434730 <ID>  5 Permissions for group not found
>>>> 21.434731 <ID>  5 Getting permissions id=documentstore owner=documentstore
>>>> perm=2
>>>> 21.434732 <ID> 10  uid=documentstore requested perm (type)=2, policy 
>>>> perm=2,
>>>> user_perm_mask=2, acl perm=2
>>>> 21.434734 <ID>  2 req 980641:0.003530:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op params
>>>> 21.434737 <ID>  2 req 980641:0.003533:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:executing
>>>> 21.434850 <ID> 20 get_obj_state: rctx=0x7f03837ed250 obj=<CNT-UUID>:version
>>>> state=0x7f03080c5378 s->prefetch_data=0
>>>> 21.436144 <ID> 10 setting object write_tag=default.78418684.980641
>>>> 21.444613 <ID>  2 req 980641:0.013408:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:http status=201
>>>> 
>>>> ==============================
>>>> 
>>>> ➜  ~  cat rados_longest.txt
>>>> 31.886128 <ID> 20 QUERY_STRING=page=swift&params=/v1/<CNT-UUID>/version
>>>> 31.886129 <ID> 20 REMOTE_ADDR=10.102.8.140
>>>> 31.886130 <ID> 20 REMOTE_PORT=46714
>>>> 31.886131 <ID> 20 REQUEST_METHOD=PUT
>>>> 31.886132 <ID> 20 REQUEST_SCHEME=https
>>>> 31.886134 <ID> 20 REQUEST_URI=/swift/v1/<CNT-UUID>/version
>>>> 31.886135 <ID> 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi
>>>> 31.886136 <ID> 20 SCRIPT_NAME=/swift/v1/<CNT-UUID>/version
>>>> 31.886137 <ID> 20
>>>> SCRIPT_URI=https://localhost:8405/swift/v1/<CNT-UUID>/version
>>>> 31.886138 <ID> 20 SCRIPT_URL=/swift/v1/<CNT-UUID>/version
>>>> 31.886139 <ID> 20 SERVER_ADDR=10.102.9.11
>>>> 31.886140 <ID> 20 SERVER_ADMIN=[no address given]
>>>> 31.886141 <ID> 20 SERVER_NAME=localhost
>>>> 31.886143 <ID> 20 SERVER_PORT=8405
>>>> 31.886144 <ID> 20 SERVER_PORT_SECURE=443
>>>> 31.886145 <ID> 20 SERVER_PROTOCOL=HTTP/1.1
>>>> 31.886146 <ID> 20 SERVER_SIGNATURE=
>>>> 31.886147 <ID> 20 SERVER_SOFTWARE=Apache/2.4.7 (Ubuntu)
>>>> 31.886148 <ID>  1 ====== starting new request req=0x7f038c024c50 =====
>>>> 31.886162 <ID>  2 req 983095:0.000013::PUT
>>>> /swift/v1/<CNT-UUID>/version::initializing
>>>> 31.886195 <ID> 10 ver=v1 first=<CNT-UUID> req=version
>>>> 31.886200 <ID> 10 s->object=version s->bucket=<CNT-UUID>
>>>> 31.886203 <ID>  2 req 983095:0.000055:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version::getting op
>>>> 31.886208 <ID>  2 req 983095:0.000060:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:authorizing
>>>> 31.886242 <ID> 20 get_obj_state: rctx=0x7f02b0062770
>>>> obj=.users.swift:documentstore:swift state=0x7f02b007ac18 
>>>> s->prefetch_data=0
>>>> 31.886250 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit
>>>> 31.886255 <ID> 20 get_obj_state: s->obj_tag was set empty
>>>> 31.886260 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit
>>>> 31.886297 <ID> 20 get_obj_state: rctx=0x7f02b0062770
>>>> obj=.users.uid:documentstore state=0x7f02b007ac18 s->prefetch_data=0
>>>> 31.886303 <ID> 10 cache get: name=.users.uid+documentstore : hit
>>>> 31.886308 <ID> 20 get_obj_state: s->obj_tag was set empty
>>>> 31.886312 <ID> 10 cache get: name=.users.uid+documentstore : hit
>>>> 31.886329 <ID> 10 swift_user=documentstore:swift
>>>> 31.886343 <ID> 20 build_token
>>>> token=13000000646f63756d656e7473746f72653a737769667410f2006ed65dbbaa4f453554207a0f1f
>>>> 31.886393 <ID>  2 req 983095:0.000245:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:reading permissions
>>>> 31.886419 <ID> 20 get_obj_state: rctx=0x7f0378fd8250 obj=.rgw:<CNT-UUID>
>>>> state=0x7f02b007ac18 s->prefetch_data=0
>>>> 31.886430 <ID> 10 cache get: name=.rgw+<CNT-UUID> : type miss 
>>>> (requested=22,
>>>> cached=19)
>>>> 36.746327 <ID> 10 cache put: name=.rgw+<CNT-UUID>
>>>> 36.746404 <ID> 10 moving .rgw+<CNT-UUID> to cache LRU end
>>>> 36.746426 <ID> 20 get_obj_state: s->obj_tag was set empty
>>>> 36.746431 <ID> 20 Read xattr: user.rgw.idtag
>>>> 36.746433 <ID> 20 Read xattr: user.rgw.manifest
>>>> 36.746452 <ID> 10 cache get: name=.rgw+<CNT-UUID> : hit
>>>> 36.746481 <ID> 20 rgw_get_bucket_info: bucket instance:
>>>> <CNT-UUID>(@{i=.rgw.buckets.index}.rgw.buckets[default.78418684.119116])
>>>> 36.746491 <ID> 20 reading from
>>>> .rgw:.bucket.meta.<CNT-UUID>:default.78418684.119116
>>>> 36.746549 <ID> 20 get_obj_state: rctx=0x7f0378fd8250
>>>> obj=.rgw:.bucket.meta.<CNT-UUID>:default.78418684.119116
>>>> state=0x7f02b00ce638 s->prefetch_data=0
>>>> 36.746585 <ID> 10 cache get:
>>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 : type miss
>>>> (requested=22, cached=19)
>>>> 36.747938 <ID> 10 cache put:
>>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116
>>>> 36.747955 <ID> 10 moving
>>>> .rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 to cache LRU end
>>>> 36.747963 <ID> 10 updating xattr: name=user.rgw.acl bl.length()=177
>>>> 36.747972 <ID> 20 get_obj_state: s->obj_tag was set empty
>>>> 36.747975 <ID> 20 Read xattr: user.rgw.acl
>>>> 36.747977 <ID> 20 Read xattr: user.rgw.idtag
>>>> 36.747978 <ID> 20 Read xattr: user.rgw.manifest
>>>> 36.747985 <ID> 10 cache get:
>>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 : hit
>>>> 36.748025 <ID> 15 Read AccessControlPolicy<AccessControlPolicy
>>>> xmlns="http://s3.amazonaws.com/doc/2006-03-01/";><Owner><ID>documentstore</ID><DisplayName>Document
>>>> Store</DisplayName></Owner><AccessControlList><Grant><Grantee
>>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>>>> xsi:type="CanonicalUser"><ID>documentstore</ID><DisplayName>Document
>>>> Store</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>
>>>> 36.748037 <ID>  2 req 983095:4.861888:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:init op
>>>> 36.748043 <ID>  2 req 983095:4.861895:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op mask
>>>> 36.748046 <ID> 20 required_mask= 2 user.op_mask=7
>>>> 36.748050 <ID>  2 req 983095:4.861902:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op permissions
>>>> 36.748054 <ID>  5 Searching permissions for uid=documentstore mask=50
>>>> 36.748056 <ID>  5 Found permission: 15
>>>> 36.748058 <ID>  5 Searching permissions for group=1 mask=50
>>>> 36.748060 <ID>  5 Permissions for group not found
>>>> 36.748061 <ID>  5 Searching permissions for group=2 mask=50
>>>> 36.748063 <ID>  5 Permissions for group not found
>>>> 36.748064 <ID>  5 Getting permissions id=documentstore owner=documentstore
>>>> perm=2
>>>> 36.748066 <ID> 10  uid=documentstore requested perm (type)=2, policy 
>>>> perm=2,
>>>> user_perm_mask=2, acl perm=2
>>>> 36.748069 <ID>  2 req 983095:4.861921:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op params
>>>> 36.748072 <ID>  2 req 983095:4.861924:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:executing
>>>> 36.748200 <ID> 20 get_obj_state: rctx=0x7f0378fd8250 obj=<CNT-UUID>:version
>>>> state=0x7f02b0042618 s->prefetch_data=0
>>>> 36.802077 <ID> 10 setting object write_tag=default.78418684.983095
>>>> 36.818727 <ID>  2 req 983095:4.932579:swift:PUT
>>>> /swift/v1/<CNT-UUID>/version:put_obj:http status=201
>>>> 
>>>> ==============================
>>>> 
>>>> 
>>>> --
>>>> Daniel Schneller
>>>> Mobile Development Lead
>>>> 
>>>> CenterDevice GmbH                  | Merscheider Straße 1
>>>>                                  | 42699 Solingen
>>>> tel: +49 1754155711                | Deutschland
>>>> daniel.schnel...@centerdevice.com  | www.centerdevice.com
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 06 Oct 2014, at 19:26, Yehuda Sadeh <yeh...@redhat.com> wrote:
>>>> 
>>>> It'd be interesting to see which rados operation is slowing down the
>>>> requests. Can you provide a log dump of a request (with 'debug rgw =
>>>> 20', and 'debug ms = 1'). This might give us a better idea as to
>>>> what's going on.
>>>> 
>>>> Thanks,
>>>> Yehuda
>>>> 
>>>> On Mon, Oct 6, 2014 at 10:05 AM, Daniel Schneller
>>>> <daniel.schnel...@centerdevice.com> wrote:
>>>> 
>>>> Hi again!
>>>> 
>>>> We have done some tests regarding the limits of storing lots and
>>>> lots of buckets through Rados Gateway into Ceph.
>>>> 
>>>> Our test used a single user for which we removed the default max
>>>> buckets limit. It then continuously created containers - both empty
>>>> and such with 10 objects of around 100k random data in them.
>>>> 
>>>> With 3 parallel processes we saw relatively consistent time of
>>>> about   500-700ms    per such container.
>>>> 
>>>> This kept steady until we reached approx. 3 million containers
>>>> after which the time per insert sharply went up to currently
>>>> around   1600ms   and rising. Due to some hiccups with network
>>>> equipment the tests were aborted a few times, but then resumed without
>>>> deleting any of the previous runs created containers, so the actual
>>>> number might be 2.8 or 3.2 million, but still in that ballpark.
>>>> We aborted the test here.
>>>> 
>>>> Judging by the advice given earlier (see quoted mail below) that
>>>> we might hit a limit on some per-user data structures, we created
>>>> another user account, removed its max-bucket limit as well and
>>>> restarted the benchmark with that one, _expecting_ the times to be
>>>> down to the original range of 500-700ms.
>>>> 
>>>> However, what we are seeing is that the times stay at the   1600ms
>>>> and higher levels even for that fresh account.
>>>> 
>>>> Here is the output of `rados df`, reformatted to fit the email.
>>>> clones, degraded and unfound were 0 in all cases and have been
>>>> left out for clarity:
>>>> 
>>>> .rgw
>>>> =========================
>>>>     KB:     1,966,932
>>>> objects:     9,094,552
>>>>     rd:   195,747,645
>>>>  rd KB:   153,585,472
>>>>     wr:    30,191,844
>>>>  wr KB:    10,751,065
>>>> 
>>>> .rgw.buckets
>>>> =========================
>>>>     KB: 2,038,313,855
>>>> objects:    22,088,103
>>>>     rd:     5,455,123
>>>>  rd KB:   408,416,317
>>>>     wr:   149,377,728
>>>>  wr KB: 1,882,517,472
>>>> 
>>>> .rgw.buckets.index
>>>> =========================
>>>>     KB:             0
>>>> objects:     5,374,376
>>>>     rd:   267,996,778
>>>>  rd KB:   262,626,106
>>>>     wr:   107,142,891
>>>>  wr KB:             0
>>>> 
>>>> .rgw.control
>>>> =========================
>>>>     KB:             0
>>>> objects:             8
>>>>     rd:             0
>>>>  rd KB:             0
>>>>     wr:             0
>>>>  wr KB:             0
>>>> 
>>>> .rgw.gc
>>>> =========================
>>>>     KB:             0
>>>> objects:            32
>>>>     rd:     5,554,407
>>>>  rd KB:     5,713,942
>>>>     wr:     8,355,934
>>>>  wr KB:             0
>>>> 
>>>> .rgw.root
>>>> =========================
>>>>     KB:             1
>>>> objects:             3
>>>>     rd:           524
>>>>  rd KB:           346
>>>>     wr:             3
>>>>  wr KB:             3
>>>> 
>>>> 
>>>> We would very much like to understand what is going on here
>>>> in order to decide if Rados Gateway is a viable option to base
>>>> our production system on (where we expect similar counts
>>>> as in the benchmark), or if we need to investigate using librados
>>>> directly which we would like to avoid if possible.
>>>> 
>>>> Any advice on what configuration parameters to check or
>>>> which additional information to provide to analyze this would be
>>>> very much welcome.
>>>> 
>>>> Cheers,
>>>> Daniel
>>>> 
>>>> 
>>>> --
>>>> Daniel Schneller
>>>> Mobile Development Lead
>>>> 
>>>> CenterDevice GmbH                  | Merscheider Straße 1
>>>>                                | 42699 Solingen
>>>> tel: +49 1754155711                | Deutschland
>>>> daniel.schnel...@centerdevice.com  | www.centerdevice.com
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 10 Sep 2014, at 19:42, Gregory Farnum <g...@inktank.com> wrote:
>>>> 
>>>> On Wednesday, September 10, 2014, Daniel Schneller
>>>> <daniel.schnel...@centerdevice.com> wrote:
>>>> 
>>>> 
>>>> On 09 Sep 2014, at 21:43, Gregory Farnum <g...@inktank.com> wrote:
>>>> 
>>>> 
>>>> Yehuda can talk about this with more expertise than I can, but I think
>>>> it should be basically fine. By creating so many buckets you're
>>>> decreasing the effectiveness of RGW's metadata caching, which means
>>>> 
>>>> the initial lookup in a particular bucket might take longer.
>>>> 
>>>> 
>>>> Thanks for your thoughts. With “initial lookup in a particular bucket”
>>>> do you mean accessing any of the objects in a bucket? If we directly
>>>> access the object (not enumerating the buckets content), would that
>>>> still be an issue?
>>>> Just trying to understand the inner workings a bit better to make
>>>> more educated guesses :)
>>>> 
>>>> 
>>>> 
>>>> When doing an object lookup, the gateway combines the "bucket ID" with a
>>>> mangled version of the object name to try and do a read out of RADOS. It
>>>> first needs to get that bucket ID though -- it will cache an the bucket
>>>> name->ID mapping, but if you have a ton of buckets there could be enough
>>>> entries to degrade the cache's effectiveness. (So, you're more likely to 
>>>> pay
>>>> that extra disk access lookup.)
>>>> 
>>>> 
>>>> 
>>>> 
>>>> The big concern is that we do maintain a per-user list of all their
>>>> buckets — which is stored in a single RADOS object — so if you have an
>>>> extreme number of buckets that RADOS object could get pretty big and
>>>> become a bottleneck when creating/removing/listing the buckets. You
>>>> 
>>>> 
>>>> Alright. Listing buckets is no problem, that we don’t do. Can you
>>>> say what “pretty big” would be in terms of MB? How much space does a
>>>> bucket record consume in there? Based on that I could run a few numbers.
>>>> 
>>>> 
>>>> 
>>>> Uh, a kilobyte per bucket? You could look it up in the source (I'm on my
>>>> phone) but I *believe* the bucket name is allowed to be larger than the 
>>>> rest
>>>> combined...
>>>> More particularly, though, if you've got a single user uploading documents,
>>>> each creating a new bucket, then those bucket creates are going to 
>>>> serialize
>>>> on this one object.
>>>> -Greg
>>>> 
>>>> 
>>>> 
>>>> 
>>>> should run your own experiments to figure out what the limits are
>>>> there; perhaps you have an easy way of sharding up documents into
>>>> different users.
>>>> 
>>>> 
>>>> Good advice. We can do that per distributor (an org unit in our
>>>> software) to at least compartmentalize any potential locking issues
>>>> in this area to that single entity. Still, there would be quite
>>>> a lot of buckets/objects per distributor, so some more detail on
>>>> the above items would be great.
>>>> 
>>>> Thanks a lot!
>>>> 
>>>> 
>>>> Daniel
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> 
>> 
>>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] max_bucket limit -- safe to disable?

Reply via email to