Hi! > By looking at these logs it seems that there are only 8 pgs on the > .rgw pool, if this is correct then you may want to change that > considering your workload.
Thanks. See out pg_num configuration below. We had already suspected that the 1600 that we had previously (48 OSDs * 100 / triple redundancy) were not ideal, so we increased the .rgw.buckets pool to 2048. The number of objects and their size was in an earlier email, but for completeness I will put them up once again. Any other ideas where to look? ============== for i in $(rados df | awk '{ print $1 }' | grep '^\.'); do echo $i; echo -n " - “; ceph osd pool get $i pg_num; echo -n " - “; ceph osd pool get $i pgp_num; done .intent-log - pg_num: 1600 - pgp_num: 1600 .log - pg_num: 1600 - pgp_num: 1600 .rgw - pg_num: 1600 - pgp_num: 1600 .rgw.buckets - pg_num: 2048 - pgp_num: 2048 .rgw.buckets.index - pg_num: 1600 - pgp_num: 1600 .rgw.control - pg_num: 1600 - pgp_num: 1600 .rgw.gc - pg_num: 1600 - pgp_num: 1600 .rgw.root - pg_num: 100 - pgp_num: 100 .usage - pg_num: 1600 - pgp_num: 1600 .users - pg_num: 1600 - pgp_num: 1600 .users.email - pg_num: 1600 - pgp_num: 1600 .users.swift - pg_num: 1600 - pgp_num: 1600 .users.uid - pg_num: 1600 - pgp_num: 1600 =============== > .rgw > ========================= > KB: 1,966,932 > objects: 9,094,552 > rd: 195,747,645 > rd KB: 153,585,472 > wr: 30,191,844 > wr KB: 10,751,065 > > .rgw.buckets > ========================= > KB: 2,038,313,855 > objects: 22,088,103 > rd: 5,455,123 > rd KB: 408,416,317 > wr: 149,377,728 > wr KB: 1,882,517,472 > > .rgw.buckets.index > ========================= > KB: 0 > objects: 5,374,376 > rd: 267,996,778 > rd KB: 262,626,106 > wr: 107,142,891 > wr KB: 0 > > .rgw.control > ========================= > KB: 0 > objects: 8 > rd: 0 > rd KB: 0 > wr: 0 > wr KB: 0 > > .rgw.gc > ========================= > KB: 0 > objects: 32 > rd: 5,554,407 > rd KB: 5,713,942 > wr: 8,355,934 > wr KB: 0 > > .rgw.root > ========================= > KB: 1 > objects: 3 > rd: 524 > rd KB: 346 > wr: 3 > wr KB: 3 Daniel > On 08 Oct 2014, at 01:03, Yehuda Sadeh <yeh...@redhat.com> wrote: > > This operation stalled quite a bit, seems that it was waiting for the osd: > > 2.547155 7f036ffc7700 1 -- 10.102.4.11:0/1009401 --> > 10.102.4.14:6809/7428 -- osd_op(client.78418684.0:27514711 > .bucket.meta.<CNT-UUID-FINDME>:default.78418684.122043 [call > version.read,getxattrs,stat] 5.3b7d1197 ack+read e16034) v4 -- ?+0 > 0x7f026802e2c0 con 0x7f040c055ca0 > ... > 7.619750 7f041ddf4700 1 -- 10.102.4.11:0/1009401 <== osd.32 > 10.102.4.14:6809/7428 208252 ==== osd_op_reply(27514711 > .bucket.meta.<CNT-UUID-FINDME>:default.78418684.122043 > [call,getxattrs,stat] v0'0 uv6371 ondisk = 0) v6 ==== 338+0+336 > (3685145659 0 4232894755) 0x7f00e430f540 con 0x7f040c055ca0 > > By looking at these logs it seems that there are only 8 pgs on the > .rgw pool, if this is correct then you may want to change that > considering your workload. > > Yehuda > > > On Tue, Oct 7, 2014 at 3:46 PM, Daniel Schneller > <daniel.schnel...@centerdevice.com> wrote: >> Hi! >> >> Sorry, I must have missed the enabling of that debug module. >> However, the test setup has been the same all the time - >> I only have the one test-application :) >> >> But maybe I phrased it a bit ambiguously when I wrote >> >>> It then continuously created containers - both empty >>> and such with 10 objects of around 100k random data in them. >> >> 100 kilobytes is the size of a single object, of which we create 10 >> per container. The container gets created first, without any >> objects, naturally, then 10 objects are added. One of these objects >> is called “version”, the rest have generated names with a fixed >> prefix and appended 1-9. The version object is the one I picked >> for the example logs I sent earlier. >> >> I hope this makes the setup clearer. >> >> Attached you will find the (now more extensive) logs for the outliers >> again. As you did not say that I garbled the logs, I assume the >> pre-processing was OK, so I have prepared the new data in a similar >> fashion, marking the relevant request with <CNT-UUID-FINDME>. >> >> I have not removed any lines in between the beginning of the >> “interesting” request and its completion to keep all the network >> traffic log intact. Due to the increased verbosity, I will not post >> the logs inline, but only attach them gzipped. >> >> As before, should the full data set be needed, I can provide >> an archived version. >> >> >> >> >> Thanks for your support! >> Daniel >> >> >> >> >>> On 07 Oct 2014, at 22:45, Yehuda Sadeh <yeh...@redhat.com> wrote: >>> >>> The logs here don't include the messenger (debug ms = 1). It's hard to >>> tell what going on from looking at the outliers. Also, in your >>> previous mail you described a different benchmark, you tested writing >>> large number of objects into a single bucket, whereas in this test >>> you're testing multiple bucket creations, which have a completely >>> different characteristics. >>> >>> >>> On Tue, Oct 7, 2014 at 1:26 PM, Daniel Schneller >>> <daniel.schnel...@centerdevice.com> wrote: >>>> Hi! >>>> >>>> I have re-run our test as follows: >>>> >>>> * 4 Rados Gateways, on 4 baremetal machines which have >>>> a total of 48 spinning rust OSDs. >>>> >>>> * Benchmark run on a virtual machine talking to HAProxy >>>> which balances the requests across the 4 Rados GWs. >>>> >>>> * Three instances of the benchmark run in parallel. Each >>>> instance creates 1000 containers, puts 11 objects into >>>> each container. Once they have all been created, each >>>> instances deletes its own containers again. >>>> >>>> I configured one of the radosgws with the debug levels >>>> you requested. The tests produced quite an amount of data >>>> (approx. 1GB of text), so I took the liberty to >>>> pre-process that a bit. >>>> >>>> In this run we landed at around 1.2s per container >>>> created (including the objects in them) on average. >>>> This was slightly better than the 1.6s we witnessed >>>> before, but that test ran for much longer, so there might >>>> have been some queue-up effect. >>>> >>>> Interestingly enough the average is actually somewhat >>>> misleading. The logs below show the creation of one >>>> object in a container each, one being the fastest of this >>>> benchmark (at least on the debug-enabled radosgw), one >>>> being the slowest. >>>> >>>> The fastest one was completed in 0.013s, the longest one >>>> took 4.93s(!). >>>> >>>> The attached logs are cleaned up so that they each show >>>> just a single request and replaced longish, but constant >>>> information with placeholders. Our container names are >>>> of the form “stresstest-xxxxxxxxxxx” which I shortened >>>> to “<CNT-UUID>” for brevity. Also, I removed the redundant >>>> prefix (date, hour, minute of day). >>>> >>>> The column before the log level looked like a thread-id. >>>> As I focused on a single request, I removed all the lines >>>> that did not match the same id, replacing the actual value >>>> with “<ID>”. That makes the logs much easier to read and >>>> understand. >>>> >>>> Just in case I might have removed too much information >>>> for the logs to be useful, the complete log is available >>>> in BZIP2 compressed form for download. Just let me know >>>> if you need it, then I will provide a link via direct email. >>>> >>>> To me it seems like there might indeed be a contention >>>> issue. It would be interesting to know, if this is correct >>>> and if there are any settings that we could adjust to >>>> alleviate the issue. >>>> >>>> Daniel >>>> >>>> ============================== >>>> >>>> ➜ ~ cat rados_shortest.txt >>>> 21.431185 <ID> 20 QUERY_STRING=page=swift¶ms=/v1/<CNT-UUID>/version >>>> 21.431187 <ID> 20 REMOTE_ADDR=10.102.8.140 >>>> 21.431188 <ID> 20 REMOTE_PORT=44007 >>>> 21.431189 <ID> 20 REQUEST_METHOD=PUT >>>> 21.431190 <ID> 20 REQUEST_SCHEME=https >>>> 21.431191 <ID> 20 REQUEST_URI=/swift/v1/<CNT-UUID>/version >>>> 21.431192 <ID> 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi >>>> 21.431193 <ID> 20 SCRIPT_NAME=/swift/v1/<CNT-UUID>/version >>>> 21.431194 <ID> 20 >>>> SCRIPT_URI=https://localhost:8405/swift/v1/<CNT-UUID>/version >>>> 21.431195 <ID> 20 SCRIPT_URL=/swift/v1/<CNT-UUID>/version >>>> 21.431196 <ID> 20 SERVER_ADDR=10.102.9.11 >>>> 21.431197 <ID> 20 SERVER_ADMIN=[no address given] >>>> 21.431198 <ID> 20 SERVER_NAME=localhost >>>> 21.431199 <ID> 20 SERVER_PORT=8405 >>>> 21.431200 <ID> 20 SERVER_PORT_SECURE=443 >>>> 21.431201 <ID> 20 SERVER_PROTOCOL=HTTP/1.1 >>>> 21.431202 <ID> 20 SERVER_SIGNATURE= >>>> 21.431203 <ID> 20 SERVER_SOFTWARE=Apache/2.4.7 (Ubuntu) >>>> 21.431205 <ID> 1 ====== starting new request req=0x7f038c019e90 ===== >>>> 21.431219 <ID> 2 req 980641:0.000015::PUT >>>> /swift/v1/<CNT-UUID>/version::initializing >>>> 21.431259 <ID> 10 ver=v1 first=<CNT-UUID> req=version >>>> 21.431265 <ID> 10 s->object=version s->bucket=<CNT-UUID> >>>> 21.431269 <ID> 2 req 980641:0.000065:swift:PUT >>>> /swift/v1/<CNT-UUID>/version::getting op >>>> 21.431274 <ID> 2 req 980641:0.000070:swift:PUT >>>> /swift/v1/<CNT-UUID>/version:put_obj:authorizing >>>> 21.431321 <ID> 20 get_obj_state: rctx=0x7f030800b720 >>>> obj=.users.swift:documentstore:swift state=0x7f03080f31e8 >>>> s->prefetch_data=0 >>>> 21.431332 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit >>>> 21.431338 <ID> 20 get_obj_state: s->obj_tag was set empty >>>> 21.431344 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit >>>> 21.431369 <ID> 20 get_obj_state: rctx=0x7f030800b720 >>>> obj=.users.uid:documentstore state=0x7f03080f31e8 s->prefetch_data=0 >>>> 21.431374 <ID> 10 cache get: name=.users.uid+documentstore : hit >>>> 21.431378 <ID> 20 get_obj_state: s->obj_tag was set empty >>>> 21.431382 <ID> 10 cache get: name=.users.uid+documentstore : hit >>>> 21.431401 <ID> 10 swift_user=documentstore:swift >>>> 21.431416 <ID> 20 build_token >>>> token=13000000646f63756d656e7473746f72653a737769667406a4b2ba3999f8a84f45355438d8ff17 >>>> 21.431467 <ID> 2 req 980641:0.000262:swift:PUT >>>> /swift/v1/<CNT-UUID>/version:put_obj:reading permissions >>>> 21.431493 <ID> 20 get_obj_state: rctx=0x7f03837ed250 obj=.rgw:<CNT-UUID> >>>> state=0x7f03080f31e8 s->prefetch_data=0 >>>> 21.431508 <ID> 10 cache get: name=.rgw+<CNT-UUID> : type miss >>>> (requested=22, >>>> cached=19) >>>> 21.433081 <ID> 10 cache put: name=.rgw+<CNT-UUID> >>>> 21.433106 <ID> 10 removing entry: >>>> name=.rgw+stresstest-ab9ee3e2-dcf5-4a5b-ab40-931d94c7784038242 from cache >>>> LRU >>>> 21.433114 <ID> 10 moving .rgw+<CNT-UUID> to cache LRU end >>>> 21.433120 <ID> 20 get_obj_state: s->obj_tag was set empty >>>> 21.433122 <ID> 20 Read xattr: user.rgw.idtag >>>> 21.433124 <ID> 20 Read xattr: user.rgw.manifest >>>> 21.433129 <ID> 10 cache get: name=.rgw+<CNT-UUID> : hit >>>> 21.433141 <ID> 20 rgw_get_bucket_info: bucket instance: >>>> <CNT-UUID>(@{i=.rgw.buckets.index}.rgw.buckets[default.78418684.118911]) >>>> 21.433148 <ID> 20 reading from >>>> .rgw:.bucket.meta.<CNT-UUID>:default.78418684.118911 >>>> 21.433169 <ID> 20 get_obj_state: rctx=0x7f03837ed250 >>>> obj=.rgw:.bucket.meta.<CNT-UUID>:default.78418684.118911 >>>> state=0x7f0308005778 s->prefetch_data=0 >>>> 21.433185 <ID> 10 cache get: >>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 : type miss >>>> (requested=22, cached=19) >>>> 21.434632 <ID> 10 cache put: >>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 >>>> 21.434650 <ID> 10 moving >>>> .rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 to cache LRU end >>>> 21.434657 <ID> 10 updating xattr: name=user.rgw.acl bl.length()=177 >>>> 21.434664 <ID> 20 get_obj_state: s->obj_tag was set empty >>>> 21.434667 <ID> 20 Read xattr: user.rgw.acl >>>> 21.434668 <ID> 20 Read xattr: user.rgw.idtag >>>> 21.434669 <ID> 20 Read xattr: user.rgw.manifest >>>> 21.434675 <ID> 10 cache get: >>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.118911 : hit >>>> 21.434701 <ID> 15 Read AccessControlPolicy<AccessControlPolicy >>>> xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>documentstore</ID><DisplayName>Document >>>> Store</DisplayName></Owner><AccessControlList><Grant><Grantee >>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >>>> xsi:type="CanonicalUser"><ID>documentstore</ID><DisplayName>Document >>>> Store</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy> >>>> 21.434710 <ID> 2 req 980641:0.003506:swift:PUT >>>> /swift/v1/<CNT-UUID>/version:put_obj:init op >>>> 21.434716 <ID> 2 req 980641:0.003512:swift:PUT >>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op mask >>>> 21.434719 <ID> 20 required_mask= 2 user.op_mask=7 >>>> 21.434720 <ID> 2 req 980641:0.003516:swift:PUT >>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op permissions >>>> 21.434723 <ID> 5 Searching permissions for uid=documentstore mask=50 >>>> 21.434725 <ID> 5 Found permission: 15 >>>> 21.434726 <ID> 5 Searching permissions for group=1 mask=50 >>>> 21.434727 <ID> 5 Permissions for group not found >>>> 21.434729 <ID> 5 Searching permissions for group=2 mask=50 >>>> 21.434730 <ID> 5 Permissions for group not found >>>> 21.434731 <ID> 5 Getting permissions id=documentstore owner=documentstore >>>> perm=2 >>>> 21.434732 <ID> 10 uid=documentstore requested perm (type)=2, policy >>>> perm=2, >>>> user_perm_mask=2, acl perm=2 >>>> 21.434734 <ID> 2 req 980641:0.003530:swift:PUT >>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op params >>>> 21.434737 <ID> 2 req 980641:0.003533:swift:PUT >>>> /swift/v1/<CNT-UUID>/version:put_obj:executing >>>> 21.434850 <ID> 20 get_obj_state: rctx=0x7f03837ed250 obj=<CNT-UUID>:version >>>> state=0x7f03080c5378 s->prefetch_data=0 >>>> 21.436144 <ID> 10 setting object write_tag=default.78418684.980641 >>>> 21.444613 <ID> 2 req 980641:0.013408:swift:PUT >>>> /swift/v1/<CNT-UUID>/version:put_obj:http status=201 >>>> >>>> ============================== >>>> >>>> ➜ ~ cat rados_longest.txt >>>> 31.886128 <ID> 20 QUERY_STRING=page=swift¶ms=/v1/<CNT-UUID>/version >>>> 31.886129 <ID> 20 REMOTE_ADDR=10.102.8.140 >>>> 31.886130 <ID> 20 REMOTE_PORT=46714 >>>> 31.886131 <ID> 20 REQUEST_METHOD=PUT >>>> 31.886132 <ID> 20 REQUEST_SCHEME=https >>>> 31.886134 <ID> 20 REQUEST_URI=/swift/v1/<CNT-UUID>/version >>>> 31.886135 <ID> 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi >>>> 31.886136 <ID> 20 SCRIPT_NAME=/swift/v1/<CNT-UUID>/version >>>> 31.886137 <ID> 20 >>>> SCRIPT_URI=https://localhost:8405/swift/v1/<CNT-UUID>/version >>>> 31.886138 <ID> 20 SCRIPT_URL=/swift/v1/<CNT-UUID>/version >>>> 31.886139 <ID> 20 SERVER_ADDR=10.102.9.11 >>>> 31.886140 <ID> 20 SERVER_ADMIN=[no address given] >>>> 31.886141 <ID> 20 SERVER_NAME=localhost >>>> 31.886143 <ID> 20 SERVER_PORT=8405 >>>> 31.886144 <ID> 20 SERVER_PORT_SECURE=443 >>>> 31.886145 <ID> 20 SERVER_PROTOCOL=HTTP/1.1 >>>> 31.886146 <ID> 20 SERVER_SIGNATURE= >>>> 31.886147 <ID> 20 SERVER_SOFTWARE=Apache/2.4.7 (Ubuntu) >>>> 31.886148 <ID> 1 ====== starting new request req=0x7f038c024c50 ===== >>>> 31.886162 <ID> 2 req 983095:0.000013::PUT >>>> /swift/v1/<CNT-UUID>/version::initializing >>>> 31.886195 <ID> 10 ver=v1 first=<CNT-UUID> req=version >>>> 31.886200 <ID> 10 s->object=version s->bucket=<CNT-UUID> >>>> 31.886203 <ID> 2 req 983095:0.000055:swift:PUT >>>> /swift/v1/<CNT-UUID>/version::getting op >>>> 31.886208 <ID> 2 req 983095:0.000060:swift:PUT >>>> /swift/v1/<CNT-UUID>/version:put_obj:authorizing >>>> 31.886242 <ID> 20 get_obj_state: rctx=0x7f02b0062770 >>>> obj=.users.swift:documentstore:swift state=0x7f02b007ac18 >>>> s->prefetch_data=0 >>>> 31.886250 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit >>>> 31.886255 <ID> 20 get_obj_state: s->obj_tag was set empty >>>> 31.886260 <ID> 10 cache get: name=.users.swift+documentstore:swift : hit >>>> 31.886297 <ID> 20 get_obj_state: rctx=0x7f02b0062770 >>>> obj=.users.uid:documentstore state=0x7f02b007ac18 s->prefetch_data=0 >>>> 31.886303 <ID> 10 cache get: name=.users.uid+documentstore : hit >>>> 31.886308 <ID> 20 get_obj_state: s->obj_tag was set empty >>>> 31.886312 <ID> 10 cache get: name=.users.uid+documentstore : hit >>>> 31.886329 <ID> 10 swift_user=documentstore:swift >>>> 31.886343 <ID> 20 build_token >>>> token=13000000646f63756d656e7473746f72653a737769667410f2006ed65dbbaa4f453554207a0f1f >>>> 31.886393 <ID> 2 req 983095:0.000245:swift:PUT >>>> /swift/v1/<CNT-UUID>/version:put_obj:reading permissions >>>> 31.886419 <ID> 20 get_obj_state: rctx=0x7f0378fd8250 obj=.rgw:<CNT-UUID> >>>> state=0x7f02b007ac18 s->prefetch_data=0 >>>> 31.886430 <ID> 10 cache get: name=.rgw+<CNT-UUID> : type miss >>>> (requested=22, >>>> cached=19) >>>> 36.746327 <ID> 10 cache put: name=.rgw+<CNT-UUID> >>>> 36.746404 <ID> 10 moving .rgw+<CNT-UUID> to cache LRU end >>>> 36.746426 <ID> 20 get_obj_state: s->obj_tag was set empty >>>> 36.746431 <ID> 20 Read xattr: user.rgw.idtag >>>> 36.746433 <ID> 20 Read xattr: user.rgw.manifest >>>> 36.746452 <ID> 10 cache get: name=.rgw+<CNT-UUID> : hit >>>> 36.746481 <ID> 20 rgw_get_bucket_info: bucket instance: >>>> <CNT-UUID>(@{i=.rgw.buckets.index}.rgw.buckets[default.78418684.119116]) >>>> 36.746491 <ID> 20 reading from >>>> .rgw:.bucket.meta.<CNT-UUID>:default.78418684.119116 >>>> 36.746549 <ID> 20 get_obj_state: rctx=0x7f0378fd8250 >>>> obj=.rgw:.bucket.meta.<CNT-UUID>:default.78418684.119116 >>>> state=0x7f02b00ce638 s->prefetch_data=0 >>>> 36.746585 <ID> 10 cache get: >>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 : type miss >>>> (requested=22, cached=19) >>>> 36.747938 <ID> 10 cache put: >>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 >>>> 36.747955 <ID> 10 moving >>>> .rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 to cache LRU end >>>> 36.747963 <ID> 10 updating xattr: name=user.rgw.acl bl.length()=177 >>>> 36.747972 <ID> 20 get_obj_state: s->obj_tag was set empty >>>> 36.747975 <ID> 20 Read xattr: user.rgw.acl >>>> 36.747977 <ID> 20 Read xattr: user.rgw.idtag >>>> 36.747978 <ID> 20 Read xattr: user.rgw.manifest >>>> 36.747985 <ID> 10 cache get: >>>> name=.rgw+.bucket.meta.<CNT-UUID>:default.78418684.119116 : hit >>>> 36.748025 <ID> 15 Read AccessControlPolicy<AccessControlPolicy >>>> xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>documentstore</ID><DisplayName>Document >>>> Store</DisplayName></Owner><AccessControlList><Grant><Grantee >>>> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >>>> xsi:type="CanonicalUser"><ID>documentstore</ID><DisplayName>Document >>>> Store</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy> >>>> 36.748037 <ID> 2 req 983095:4.861888:swift:PUT >>>> /swift/v1/<CNT-UUID>/version:put_obj:init op >>>> 36.748043 <ID> 2 req 983095:4.861895:swift:PUT >>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op mask >>>> 36.748046 <ID> 20 required_mask= 2 user.op_mask=7 >>>> 36.748050 <ID> 2 req 983095:4.861902:swift:PUT >>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op permissions >>>> 36.748054 <ID> 5 Searching permissions for uid=documentstore mask=50 >>>> 36.748056 <ID> 5 Found permission: 15 >>>> 36.748058 <ID> 5 Searching permissions for group=1 mask=50 >>>> 36.748060 <ID> 5 Permissions for group not found >>>> 36.748061 <ID> 5 Searching permissions for group=2 mask=50 >>>> 36.748063 <ID> 5 Permissions for group not found >>>> 36.748064 <ID> 5 Getting permissions id=documentstore owner=documentstore >>>> perm=2 >>>> 36.748066 <ID> 10 uid=documentstore requested perm (type)=2, policy >>>> perm=2, >>>> user_perm_mask=2, acl perm=2 >>>> 36.748069 <ID> 2 req 983095:4.861921:swift:PUT >>>> /swift/v1/<CNT-UUID>/version:put_obj:verifying op params >>>> 36.748072 <ID> 2 req 983095:4.861924:swift:PUT >>>> /swift/v1/<CNT-UUID>/version:put_obj:executing >>>> 36.748200 <ID> 20 get_obj_state: rctx=0x7f0378fd8250 obj=<CNT-UUID>:version >>>> state=0x7f02b0042618 s->prefetch_data=0 >>>> 36.802077 <ID> 10 setting object write_tag=default.78418684.983095 >>>> 36.818727 <ID> 2 req 983095:4.932579:swift:PUT >>>> /swift/v1/<CNT-UUID>/version:put_obj:http status=201 >>>> >>>> ============================== >>>> >>>> >>>> -- >>>> Daniel Schneller >>>> Mobile Development Lead >>>> >>>> CenterDevice GmbH | Merscheider Straße 1 >>>> | 42699 Solingen >>>> tel: +49 1754155711 | Deutschland >>>> daniel.schnel...@centerdevice.com | www.centerdevice.com >>>> >>>> >>>> >>>> >>>> On 06 Oct 2014, at 19:26, Yehuda Sadeh <yeh...@redhat.com> wrote: >>>> >>>> It'd be interesting to see which rados operation is slowing down the >>>> requests. Can you provide a log dump of a request (with 'debug rgw = >>>> 20', and 'debug ms = 1'). This might give us a better idea as to >>>> what's going on. >>>> >>>> Thanks, >>>> Yehuda >>>> >>>> On Mon, Oct 6, 2014 at 10:05 AM, Daniel Schneller >>>> <daniel.schnel...@centerdevice.com> wrote: >>>> >>>> Hi again! >>>> >>>> We have done some tests regarding the limits of storing lots and >>>> lots of buckets through Rados Gateway into Ceph. >>>> >>>> Our test used a single user for which we removed the default max >>>> buckets limit. It then continuously created containers - both empty >>>> and such with 10 objects of around 100k random data in them. >>>> >>>> With 3 parallel processes we saw relatively consistent time of >>>> about 500-700ms per such container. >>>> >>>> This kept steady until we reached approx. 3 million containers >>>> after which the time per insert sharply went up to currently >>>> around 1600ms and rising. Due to some hiccups with network >>>> equipment the tests were aborted a few times, but then resumed without >>>> deleting any of the previous runs created containers, so the actual >>>> number might be 2.8 or 3.2 million, but still in that ballpark. >>>> We aborted the test here. >>>> >>>> Judging by the advice given earlier (see quoted mail below) that >>>> we might hit a limit on some per-user data structures, we created >>>> another user account, removed its max-bucket limit as well and >>>> restarted the benchmark with that one, _expecting_ the times to be >>>> down to the original range of 500-700ms. >>>> >>>> However, what we are seeing is that the times stay at the 1600ms >>>> and higher levels even for that fresh account. >>>> >>>> Here is the output of `rados df`, reformatted to fit the email. >>>> clones, degraded and unfound were 0 in all cases and have been >>>> left out for clarity: >>>> >>>> .rgw >>>> ========================= >>>> KB: 1,966,932 >>>> objects: 9,094,552 >>>> rd: 195,747,645 >>>> rd KB: 153,585,472 >>>> wr: 30,191,844 >>>> wr KB: 10,751,065 >>>> >>>> .rgw.buckets >>>> ========================= >>>> KB: 2,038,313,855 >>>> objects: 22,088,103 >>>> rd: 5,455,123 >>>> rd KB: 408,416,317 >>>> wr: 149,377,728 >>>> wr KB: 1,882,517,472 >>>> >>>> .rgw.buckets.index >>>> ========================= >>>> KB: 0 >>>> objects: 5,374,376 >>>> rd: 267,996,778 >>>> rd KB: 262,626,106 >>>> wr: 107,142,891 >>>> wr KB: 0 >>>> >>>> .rgw.control >>>> ========================= >>>> KB: 0 >>>> objects: 8 >>>> rd: 0 >>>> rd KB: 0 >>>> wr: 0 >>>> wr KB: 0 >>>> >>>> .rgw.gc >>>> ========================= >>>> KB: 0 >>>> objects: 32 >>>> rd: 5,554,407 >>>> rd KB: 5,713,942 >>>> wr: 8,355,934 >>>> wr KB: 0 >>>> >>>> .rgw.root >>>> ========================= >>>> KB: 1 >>>> objects: 3 >>>> rd: 524 >>>> rd KB: 346 >>>> wr: 3 >>>> wr KB: 3 >>>> >>>> >>>> We would very much like to understand what is going on here >>>> in order to decide if Rados Gateway is a viable option to base >>>> our production system on (where we expect similar counts >>>> as in the benchmark), or if we need to investigate using librados >>>> directly which we would like to avoid if possible. >>>> >>>> Any advice on what configuration parameters to check or >>>> which additional information to provide to analyze this would be >>>> very much welcome. >>>> >>>> Cheers, >>>> Daniel >>>> >>>> >>>> -- >>>> Daniel Schneller >>>> Mobile Development Lead >>>> >>>> CenterDevice GmbH | Merscheider Straße 1 >>>> | 42699 Solingen >>>> tel: +49 1754155711 | Deutschland >>>> daniel.schnel...@centerdevice.com | www.centerdevice.com >>>> >>>> >>>> >>>> >>>> On 10 Sep 2014, at 19:42, Gregory Farnum <g...@inktank.com> wrote: >>>> >>>> On Wednesday, September 10, 2014, Daniel Schneller >>>> <daniel.schnel...@centerdevice.com> wrote: >>>> >>>> >>>> On 09 Sep 2014, at 21:43, Gregory Farnum <g...@inktank.com> wrote: >>>> >>>> >>>> Yehuda can talk about this with more expertise than I can, but I think >>>> it should be basically fine. By creating so many buckets you're >>>> decreasing the effectiveness of RGW's metadata caching, which means >>>> >>>> the initial lookup in a particular bucket might take longer. >>>> >>>> >>>> Thanks for your thoughts. With “initial lookup in a particular bucket” >>>> do you mean accessing any of the objects in a bucket? If we directly >>>> access the object (not enumerating the buckets content), would that >>>> still be an issue? >>>> Just trying to understand the inner workings a bit better to make >>>> more educated guesses :) >>>> >>>> >>>> >>>> When doing an object lookup, the gateway combines the "bucket ID" with a >>>> mangled version of the object name to try and do a read out of RADOS. It >>>> first needs to get that bucket ID though -- it will cache an the bucket >>>> name->ID mapping, but if you have a ton of buckets there could be enough >>>> entries to degrade the cache's effectiveness. (So, you're more likely to >>>> pay >>>> that extra disk access lookup.) >>>> >>>> >>>> >>>> >>>> The big concern is that we do maintain a per-user list of all their >>>> buckets — which is stored in a single RADOS object — so if you have an >>>> extreme number of buckets that RADOS object could get pretty big and >>>> become a bottleneck when creating/removing/listing the buckets. You >>>> >>>> >>>> Alright. Listing buckets is no problem, that we don’t do. Can you >>>> say what “pretty big” would be in terms of MB? How much space does a >>>> bucket record consume in there? Based on that I could run a few numbers. >>>> >>>> >>>> >>>> Uh, a kilobyte per bucket? You could look it up in the source (I'm on my >>>> phone) but I *believe* the bucket name is allowed to be larger than the >>>> rest >>>> combined... >>>> More particularly, though, if you've got a single user uploading documents, >>>> each creating a new bucket, then those bucket creates are going to >>>> serialize >>>> on this one object. >>>> -Greg >>>> >>>> >>>> >>>> >>>> should run your own experiments to figure out what the limits are >>>> there; perhaps you have an easy way of sharding up documents into >>>> different users. >>>> >>>> >>>> Good advice. We can do that per distributor (an org unit in our >>>> software) to at least compartmentalize any potential locking issues >>>> in this area to that single entity. Still, there would be quite >>>> a lot of buckets/objects per distributor, so some more detail on >>>> the above items would be great. >>>> >>>> Thanks a lot! >>>> >>>> >>>> Daniel >>>> >>>> >>>> >>>> >>>> -- >>>> Software Engineer #42 @ http://inktank.com | http://ceph.com >>>> >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >> >>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com