Lucas, Thanks for all the detailed information. This is not expected behavior. What MIME type are you using for storing the long integer data (64 binary bits, I assume)?
I'd like to try and reproduce this. There have been issues with TTL and max_memory but they should have been fixed for Riak 2.0. -- Luke Bakken Engineer / CSE lbak...@basho.com On Mon, Oct 20, 2014 at 1:56 AM, Lucas Grijander <lucasgrinjande...@gmail.com> wrote: > Hi Luke, > > Indeed, when removed the thousands of requests, the memory is stabilized. > However the memory consumption is still very high: > > riak-admin status |grep memory > memory_total : 18494760128 > memory_processes : 145363184 > memory_processes_used : 142886424 > memory_system : 18349396944 > memory_atom : 561761 > memory_atom_used : 554496 > memory_binary : 7108243240 > memory_code : 13917820 > memory_ets : 11200328880 > > I have test also with Riak 1.4.10 and the performance is the same. > > Is it normal that the "memory_ets" has more than 10GB when we have a > "ring_size" of 16 and a max_memory_per_vnode = 250MB? > > 2014-10-15 20:50 GMT+02:00 Lucas Grijander <lucasgrinjande...@gmail.com>: >> >> Hi Luke. >> >> About the first issue: >> >> - From the beginning, the servers are all running ntpd. They are Ubuntu >> 14.04 and the ntpd service is installed and running by default. >> - Anti-entropy was also disabled from the beginning: >> >> {anti_entropy,{off,[]}}, >> >> >> About the second issue, I am perplex because, after 2 restarts of the Riak >> server, just now there is a big memory consumption but is not growing like >> the previous days. The only change was to remove this code (it was used >> thousands of times/s). It was a possible workaround about the previous >> problem with the TTL but this code now is useless because the TTL is working >> fine with this node alone: >> >> self.db.delete((key) >> self.db.get(key, r=1) >> >> >> # riak-admin status|grep memory >> memory_total : 18617871264 >> memory_processes : 224480232 >> memory_processes_used : 222700176 >> memory_system : 18393391032 >> memory_atom : 561761 >> memory_atom_used : 552862 >> memory_binary : 7135206080 >> memory_code : 13779729 >> memory_ets : 11209256232 >> >> The problem is that I don't remember if the code change was after or >> before the second restart. I am going to restart the riak server again and I >> will report you about if the "possible memory leak" is reproduced. >> >> This is the props of the bucket: >> >> {"props":{"allow_mult":false,"backend":"ttl_stg","basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dvv_enabled":false,"dw":"quorum","last_write_wins":true,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":1,"name":"ttl_stg","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":1,"rw":"quorum","small_vclock":50,"w":1,"young_vclock":20}} >> >> About the data that we put into the bucket are all with this schema: >> >> KEY: Alphanumeric with a length of 47 >> DATA: Long integer. >> >> # riak-admin status|grep puts >> vnode_puts : 84708 >> vnode_puts_total : 123127430 >> node_puts : 83169 >> node_puts_total : 123128062 >> >> # riak-admin status|grep gets >> vnode_gets : 162314 >> vnode_gets_total : 240433213 >> node_gets : 162317 >> node_gets_total : 240433216 >> >> 2014-10-14 16:26 GMT+02:00 Luke Bakken <lbak...@basho.com>: >>> >>> Hi Lucas, >>> >>> With regard to the mysterious key deletion / resurrection, please do >>> the following: >>> >>> * Ensure your servers are all running ntpd and have their time >>> synchronized as closely as possible. >>> * Disable anti-entropy. I suspect this is causing the strange behavior >>> you're seeing with keys. >>> >>> Your single node cluster memory consumption issue is a bit of a >>> puzzler. I'm assuming you're using default bucket settings and not >>> using bucket types based on your previous emails, and that allow_mult >>> is still false for your ttl_stg bucket. Can you tell me more about the >>> data you're putting into that bucket for testing? I'll try and >>> reproduce it with my single node cluster. >>> >>> -- >>> Luke Bakken >>> Engineer / CSE >>> lbak...@basho.com >>> >>> >>> On Mon, Oct 13, 2014 at 5:02 PM, Lucas Grijander >>> <lucasgrinjande...@gmail.com> wrote: >>> > Hi Luke. >>> > >>> > I really appreciate your efforts to attempt to reproduce the problem. I >>> > think that the configs are right. I have been doing also a lot of tests >>> > and >>> > with 1 server/node, the memory bucket works flawlessly, as your test. >>> > The >>> > Riak cluster where we have the problem has a multi_backend with 1 >>> > memory >>> > backend, 2 bitcask backends and 2 leveldb backends. I have only changed >>> > the >>> > parameter connection of the memory backend in our production code to >>> > another >>> > new "cluster" with only 1 node, with the same config of Riak but with >>> > only 1 >>> > memory backend under the multi configuration and, as I said, all fine, >>> > the >>> > problem vanished. I deduce that the problem appears only with more than >>> > 1 >>> > node and with a lot of requests. >>> > >>> > In my tests with the production cluster with the problem ( 4 nodes), >>> > finally >>> > I realized that the TTL is working but, randomly and suddenly, KEYS >>> > already >>> > deleted appear, and KEYS with correct TTL disappear :-? (Maybe >>> > something >>> > related with the some ETS internal table? ) This is the moment when I >>> > can >>> > obtain KEYS already expired. >>> > >>> > In summary: >>> > >>> > - With cluster with 4 nodes (config below): All OK for a while and >>> > suddenly >>> > we lost the last 20 seconds approx. of keys and OLD keys appear in the >>> > list: >>> > curl -X GET http://localhost:8098/buckets/ttl_stg/keys?keys=true >>> > >>> > buckets.default.last_write_wins = true >>> > bitcask.io_mode = erlang >>> > multi_backend.ttl_stg.storage_backend = memory >>> > multi_backend.ttl_stg.memory_backend.ttl = 90s >>> > multi_backend.ttl_stg.memory_backend.max_memory_per_vnode = 25MB >>> > anti_entropy = passive >>> > ring_size = 256 >>> > >>> > - With 1 node: All OK >>> > >>> > buckets.default.n_val = 1 >>> > buckets.default.last_write_wins = true >>> > buckets.default.r = 1 >>> > buckets.default.w = 1 >>> > multi_backend. ttl_stg.storage_backend = memory >>> > multi_backend. ttl_stg.memory_backend.ttl = 90s >>> > multi_backend. ttl_stg.memory_backend.max_memory_per_vnode = 250MB >>> > ring_size = 16 >>> > >>> > >>> > >>> > Another note: With this 1 node (32GB RAM) and only activated the memory >>> > backend I have realized than the memory consumption grows without >>> > control: >>> > >>> > >>> > # riak-admin status|grep memory >>> > memory_total : 17323130960 >>> > memory_processes : 235043016 >>> > memory_processes_used : 233078456 >>> > memory_system : 17088087944 >>> > memory_atom : 561761 >>> > memory_atom_used : 561127 >>> > memory_binary : 6737787976 >>> > memory_code : 14370908 >>> > memory_ets : 10295224544 >>> > >>> > # # riak-admin diag -d debug >>> > [debug] Local RPC: os:getpid([]) [5000] >>> > [debug] Running shell command: ps -o pmem,rss -p 17521 >>> > [debug] Shell command output: >>> > %MEM RSS >>> > 60.5 19863800 >>> > >>> > Wow 18.9GB when the max_memory_per_vnode = 250MB. Is far away from the >>> > value, 250*16vnodes = 4000MB. Is it that correct? >>> > >>> > This is the riak-admin vnode-status of 1 vnode, the other 15 are with >>> > similar data: >>> > >>> > VNode: 1370157784997721485815954530671515330927436759040 >>> > Backend: riak_kv_multi_backend >>> > Status: >>> > [{<<"ttl_stg">>, >>> > [{mod,riak_kv_memory_backend}, >>> > {data_table_status,[{compressed,false}, >>> > {memory,1156673}, >>> > {owner,<8343.9466.104>}, >>> > {heir,none}, >>> > >>> > {name,riak_kv_1370157784997721485815954530671515330927436759040}, >>> > {size,29656}, >>> > {node,'riak@xxxxxxxx'}, >>> > {named_table,false}, >>> > {type,ordered_set}, >>> > {keypos,1}, >>> > {protection,protected}]}, >>> > {index_table_status,[{compressed,false}, >>> > {memory,89}, >>> > {owner,<8343.9466.104>}, >>> > {heir,none}, >>> > >>> > {name,riak_kv_1370157784997721485815954530671515330927436759040_i}, >>> > {size,0}, >>> > {node,'riak@xxxxxxxxx'}, >>> > {named_table,false}, >>> > {type,ordered_set}, >>> > {keypos,1}, >>> > {protection,protected}]}, >>> > {time_table_status,[{compressed,false}, >>> > {memory,75968936}, >>> > {owner,<8343.9466.104>}, >>> > {heir,none}, >>> > >>> > {name,riak_kv_1370157784997721485815954530671515330927436759040_t}, >>> > {size,2813661}, >>> > {node,'riak@xxxxxxxxx'}, >>> > {named_table,false}, >>> > {type,ordered_set}, >>> > {keypos,1}, >>> > {protection,protected}]}]}] >>> > >>> > Thanks! >>> > >>> > 2014-10-13 22:30 GMT+02:00 Luke Bakken <lbak...@basho.com>: >>> >> >>> >> Hi Lucas, >>> >> >>> >> I've tried reproducing this using a local Riak 2.0.1 node, however TTL >>> >> is working as expected. >>> >> >>> >> Here is the configuration I have in /etc/riak/riak.conf: >>> >> >>> >> storage_backend = multi >>> >> multi_backend.default = bc_default >>> >> >>> >> multi_backend.ttl_stg.storage_backend = memory >>> >> multi_backend.ttl_stg.memory_backend.ttl = 90s >>> >> multi_backend.ttl_stg.memory_backend.max_memory_per_vnode = 4MB >>> >> >>> >> multi_backend.bc_default.storage_backend = bitcask >>> >> multi_backend.bc_default.bitcask.data_root = /var/lib/riak/bc_default >>> >> multi_backend.bc_default.bitcask.io_mode = erlang >>> >> >>> >> This translates to the following in >>> >> /var/lib/riak/generated.configs/app.2014.10.13.13.13.29.config: >>> >> >>> >> {multi_backend_default,<<"bc_default">>}, >>> >> {multi_backend, >>> >> [{<<"ttl_stg">>,riak_kv_memory_backend,[{ttl,90},{max_memory,4}]}, >>> >> {<<"bc_default">>,riak_kv_bitcask_backend, >>> >> [{io_mode,erlang}, >>> >> {expiry_grace_time,0}, >>> >> {small_file_threshold,10485760}, >>> >> {dead_bytes_threshold,134217728}, >>> >> {frag_threshold,40}, >>> >> {dead_bytes_merge_trigger,536870912}, >>> >> {frag_merge_trigger,60}, >>> >> {max_file_size,2147483648}, >>> >> {open_timeout,4}, >>> >> {data_root,"/var/lib/riak/bc_default"}, >>> >> {sync_strategy,none}, >>> >> {merge_window,always}, >>> >> {max_fold_age,-1}, >>> >> {max_fold_puts,0}, >>> >> {expiry_secs,-1}, >>> >> {require_hint_crc,true}]}]}]}, >>> >> >>> >> I set the bucket properties to use the ttl_stg backend: >>> >> >>> >> root@UBUNTU-12-1:~# cat ttl_stg-props.json >>> >> {"props":{"name":"ttl_stg","backend":"ttl_stg"}} >>> >> >>> >> root@UBUNTU-12-1:~# curl -XPUT -H'Content-type: application/json' >>> >> localhost:8098/buckets/ttl_stg/props --data-ascii @ttl_stg-props.json >>> >> >>> >> root@UBUNTU-12-1:~# curl -XGET localhost:8098/buckets/ttl_stg/props >>> >> >>> >> >>> >> {"props":{"allow_mult":false,"backend":"ttl_stg","basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dvv_enabled":false,"dw":"quorum","last_write_wins":false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"ttl_stg","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":"quorum","rw":"quorum","small_vclock":50,"w":"quorum","young_vclock":20}} >>> >> >>> >> >>> >> And used the following statement to PUT test data: >>> >> >>> >> curl -XPUT localhost:8098/buckets/ttl_stg/keys/1 -d "TEST $(date)" >>> >> >>> >> After 90 seconds, this is the response I get from Riak: >>> >> >>> >> root@UBUNTU-12-1:~# curl -XGET localhost:8098/buckets/ttl_stg/keys/1 >>> >> not found >>> >> >>> >> I would carefully check all of the app.config / riak.conf files in >>> >> your cluster, the output of "riak config effective" and the bucket >>> >> properties for those buckets you expect to be using the memory backend >>> >> with TTL. I also recommend using the localhost:8098/buckets/ endpoint >>> >> instead of the deprecated riak/ endpoint. >>> >> >>> >> Please let me know if you have additional questions. >>> >> -- >>> >> Luke Bakken >>> >> Engineer / CSE >>> >> lbak...@basho.com >>> >> >>> >> >>> >> On Fri, Oct 3, 2014 at 11:32 AM, Lucas Grijander >>> >> <lucasgrinjande...@gmail.com> wrote: >>> >> > Hello, >>> >> > >>> >> > I have a memory backend in production with Riak 2.0.1, 4 servers and >>> >> > 256 >>> >> > vnodes. The servers have the same date and time. >>> >> > >>> >> > I have seen an odd performance with the ttl. >>> >> > >>> >> > This is the config: >>> >> > >>> >> > {<<"ttl_stg">>,riak_kv_memory_backend, >>> >> > [{ttl,90},{max_memory,25}]}, >>> >> > >>> >> > For example, see this GET response in one of the riak servers: >>> >> > >>> >> > < HTTP/1.1 200 OK >>> >> > < X-Riak-Vclock: a85hYGBgzGDKBVIc4otdfgR/7bfIYEpkzGNlKI1efJYvCwA= >>> >> > < Vary: Accept-Encoding >>> >> > * Server MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained) >>> >> > is >>> >> > not >>> >> > blacklisted >>> >> > < Server: MochiWeb/1.1 WebMachine/1.10.5 (jokes are better >>> >> > explained) >>> >> > < Link: </riak/ttl_stg>; rel="up" >>> >> > < Last-Modified: Fri, 03 Oct 2014 17:40:05 GMT >>> >> > < ETag: "3c8bGoifWcOCSVn0otD5nI" >>> >> > < Date: Fri, 03 Oct 2014 17:47:50 GMT >>> >> > < Content-Type: application/json >>> >> > < Content-Length: 17 >>> >> > >>> >> > If the TTL is 90 seconds, Why the GET doesn't return "not found" if >>> >> > the >>> >> > difference between "Last-Modified" and "Date" (of the curl request) >>> >> > is >>> >> > greater than the TTL? >>> >> > >>> >> > Thanks in advance! >>> >> > >>> >> > >>> >> > _______________________________________________ >>> >> > riak-users mailing list >>> >> > riak-users@lists.basho.com >>> >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> >> > >>> > >>> > >>> > >>> > _______________________________________________ >>> > riak-users mailing list >>> > riak-users@lists.basho.com >>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>> > >> >> > _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com