Hi Luke, Indeed, when removed the thousands of requests, the memory is stabilized. However the memory consumption is still very high:
riak-admin status |grep memory memory_total : 18494760128 memory_processes : 145363184 memory_processes_used : 142886424 memory_system : 18349396944 memory_atom : 561761 memory_atom_used : 554496 memory_binary : 7108243240 memory_code : 13917820 memory_ets : 11200328880 I have test also with Riak 1.4.10 and the performance is the same. Is it normal that the "memory_ets" has more than 10GB when we have a "ring_size" of 16 and a max_memory_per_vnode = 250MB? 2014-10-15 20:50 GMT+02:00 Lucas Grijander <lucasgrinjande...@gmail.com>: > Hi Luke. > > About the first issue: > > - From the beginning, the servers are all running ntpd. They are Ubuntu > 14.04 and the ntpd service is installed and running by default. > - Anti-entropy was also disabled from the beginning: > > {anti_entropy,{off,[]}}, > > > About the second issue, I am perplex because, after 2 restarts of the Riak > server, just now there is a big memory consumption but is not growing like > the previous days. The only change was to remove this code (it was used > thousands of times/s). It was a possible workaround about the previous > problem with the TTL but this code now is useless because the TTL is > working fine with this node alone: > > self.db.delete((key) > self.db.get(key, r=1) > > > # riak-admin status|grep memory > memory_total : 18617871264 > memory_processes : 224480232 > memory_processes_used : 222700176 > memory_system : 18393391032 > memory_atom : 561761 > memory_atom_used : 552862 > memory_binary : 7135206080 > memory_code : 13779729 > memory_ets : 11209256232 > > The problem is that I don't remember if the code change was after or > before the second restart. I am going to restart the riak server again and > I will report you about if the "possible memory leak" is reproduced. > > This is the props of the bucket: > > {"props":{"allow_mult":false,"backend":"ttl_stg","basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dvv_enabled":false,"dw":"quorum","last_write_wins":true,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":1,"name":"ttl_stg","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":1,"rw":"quorum","small_vclock":50,"w":1,"young_vclock":20}} > > About the data that we put into the bucket are all with this schema: > > KEY: Alphanumeric with a length of 47 > DATA: Long integer. > > # riak-admin status|grep puts > vnode_puts : 84708 > vnode_puts_total : 123127430 > node_puts : 83169 > node_puts_total : 123128062 > > # riak-admin status|grep gets > vnode_gets : 162314 > vnode_gets_total : 240433213 > node_gets : 162317 > node_gets_total : 240433216 > > 2014-10-14 16:26 GMT+02:00 Luke Bakken <lbak...@basho.com>: > >> Hi Lucas, >> >> With regard to the mysterious key deletion / resurrection, please do >> the following: >> >> * Ensure your servers are all running ntpd and have their time >> synchronized as closely as possible. >> * Disable anti-entropy. I suspect this is causing the strange behavior >> you're seeing with keys. >> >> Your single node cluster memory consumption issue is a bit of a >> puzzler. I'm assuming you're using default bucket settings and not >> using bucket types based on your previous emails, and that allow_mult >> is still false for your ttl_stg bucket. Can you tell me more about the >> data you're putting into that bucket for testing? I'll try and >> reproduce it with my single node cluster. >> >> -- >> Luke Bakken >> Engineer / CSE >> lbak...@basho.com >> >> >> On Mon, Oct 13, 2014 at 5:02 PM, Lucas Grijander >> <lucasgrinjande...@gmail.com> wrote: >> > Hi Luke. >> > >> > I really appreciate your efforts to attempt to reproduce the problem. I >> > think that the configs are right. I have been doing also a lot of tests >> and >> > with 1 server/node, the memory bucket works flawlessly, as your test. >> The >> > Riak cluster where we have the problem has a multi_backend with 1 memory >> > backend, 2 bitcask backends and 2 leveldb backends. I have only changed >> the >> > parameter connection of the memory backend in our production code to >> another >> > new "cluster" with only 1 node, with the same config of Riak but with >> only 1 >> > memory backend under the multi configuration and, as I said, all fine, >> the >> > problem vanished. I deduce that the problem appears only with more than >> 1 >> > node and with a lot of requests. >> > >> > In my tests with the production cluster with the problem ( 4 nodes), >> finally >> > I realized that the TTL is working but, randomly and suddenly, KEYS >> already >> > deleted appear, and KEYS with correct TTL disappear :-? (Maybe something >> > related with the some ETS internal table? ) This is the moment when I >> can >> > obtain KEYS already expired. >> > >> > In summary: >> > >> > - With cluster with 4 nodes (config below): All OK for a while and >> suddenly >> > we lost the last 20 seconds approx. of keys and OLD keys appear in the >> list: >> > curl -X GET http://localhost:8098/buckets/ttl_stg/keys?keys=true >> > >> > buckets.default.last_write_wins = true >> > bitcask.io_mode = erlang >> > multi_backend.ttl_stg.storage_backend = memory >> > multi_backend.ttl_stg.memory_backend.ttl = 90s >> > multi_backend.ttl_stg.memory_backend.max_memory_per_vnode = 25MB >> > anti_entropy = passive >> > ring_size = 256 >> > >> > - With 1 node: All OK >> > >> > buckets.default.n_val = 1 >> > buckets.default.last_write_wins = true >> > buckets.default.r = 1 >> > buckets.default.w = 1 >> > multi_backend. ttl_stg.storage_backend = memory >> > multi_backend. ttl_stg.memory_backend.ttl = 90s >> > multi_backend. ttl_stg.memory_backend.max_memory_per_vnode = 250MB >> > ring_size = 16 >> > >> > >> > >> > Another note: With this 1 node (32GB RAM) and only activated the memory >> > backend I have realized than the memory consumption grows without >> control: >> > >> > >> > # riak-admin status|grep memory >> > memory_total : 17323130960 >> > memory_processes : 235043016 >> > memory_processes_used : 233078456 >> > memory_system : 17088087944 >> > memory_atom : 561761 >> > memory_atom_used : 561127 >> > memory_binary : 6737787976 >> > memory_code : 14370908 >> > memory_ets : 10295224544 >> > >> > # # riak-admin diag -d debug >> > [debug] Local RPC: os:getpid([]) [5000] >> > [debug] Running shell command: ps -o pmem,rss -p 17521 >> > [debug] Shell command output: >> > %MEM RSS >> > 60.5 19863800 >> > >> > Wow 18.9GB when the max_memory_per_vnode = 250MB. Is far away from the >> > value, 250*16vnodes = 4000MB. Is it that correct? >> > >> > This is the riak-admin vnode-status of 1 vnode, the other 15 are with >> > similar data: >> > >> > VNode: 1370157784997721485815954530671515330927436759040 >> > Backend: riak_kv_multi_backend >> > Status: >> > [{<<"ttl_stg">>, >> > [{mod,riak_kv_memory_backend}, >> > {data_table_status,[{compressed,false}, >> > {memory,1156673}, >> > {owner,<8343.9466.104>}, >> > {heir,none}, >> > >> > {name,riak_kv_1370157784997721485815954530671515330927436759040}, >> > {size,29656}, >> > {node,'riak@xxxxxxxx'}, >> > {named_table,false}, >> > {type,ordered_set}, >> > {keypos,1}, >> > {protection,protected}]}, >> > {index_table_status,[{compressed,false}, >> > {memory,89}, >> > {owner,<8343.9466.104>}, >> > {heir,none}, >> > >> > {name,riak_kv_1370157784997721485815954530671515330927436759040_i}, >> > {size,0}, >> > {node,'riak@xxxxxxxxx'}, >> > {named_table,false}, >> > {type,ordered_set}, >> > {keypos,1}, >> > {protection,protected}]}, >> > {time_table_status,[{compressed,false}, >> > {memory,75968936}, >> > {owner,<8343.9466.104>}, >> > {heir,none}, >> > >> > {name,riak_kv_1370157784997721485815954530671515330927436759040_t}, >> > {size,2813661}, >> > {node,'riak@xxxxxxxxx'}, >> > {named_table,false}, >> > {type,ordered_set}, >> > {keypos,1}, >> > {protection,protected}]}]}] >> > >> > Thanks! >> > >> > 2014-10-13 22:30 GMT+02:00 Luke Bakken <lbak...@basho.com>: >> >> >> >> Hi Lucas, >> >> >> >> I've tried reproducing this using a local Riak 2.0.1 node, however TTL >> >> is working as expected. >> >> >> >> Here is the configuration I have in /etc/riak/riak.conf: >> >> >> >> storage_backend = multi >> >> multi_backend.default = bc_default >> >> >> >> multi_backend.ttl_stg.storage_backend = memory >> >> multi_backend.ttl_stg.memory_backend.ttl = 90s >> >> multi_backend.ttl_stg.memory_backend.max_memory_per_vnode = 4MB >> >> >> >> multi_backend.bc_default.storage_backend = bitcask >> >> multi_backend.bc_default.bitcask.data_root = /var/lib/riak/bc_default >> >> multi_backend.bc_default.bitcask.io_mode = erlang >> >> >> >> This translates to the following in >> >> /var/lib/riak/generated.configs/app.2014.10.13.13.13.29.config: >> >> >> >> {multi_backend_default,<<"bc_default">>}, >> >> {multi_backend, >> >> [{<<"ttl_stg">>,riak_kv_memory_backend,[{ttl,90},{max_memory,4}]}, >> >> {<<"bc_default">>,riak_kv_bitcask_backend, >> >> [{io_mode,erlang}, >> >> {expiry_grace_time,0}, >> >> {small_file_threshold,10485760}, >> >> {dead_bytes_threshold,134217728}, >> >> {frag_threshold,40}, >> >> {dead_bytes_merge_trigger,536870912}, >> >> {frag_merge_trigger,60}, >> >> {max_file_size,2147483648}, >> >> {open_timeout,4}, >> >> {data_root,"/var/lib/riak/bc_default"}, >> >> {sync_strategy,none}, >> >> {merge_window,always}, >> >> {max_fold_age,-1}, >> >> {max_fold_puts,0}, >> >> {expiry_secs,-1}, >> >> {require_hint_crc,true}]}]}]}, >> >> >> >> I set the bucket properties to use the ttl_stg backend: >> >> >> >> root@UBUNTU-12-1:~# cat ttl_stg-props.json >> >> {"props":{"name":"ttl_stg","backend":"ttl_stg"}} >> >> >> >> root@UBUNTU-12-1:~# curl -XPUT -H'Content-type: application/json' >> >> localhost:8098/buckets/ttl_stg/props --data-ascii @ttl_stg-props.json >> >> >> >> root@UBUNTU-12-1:~# curl -XGET localhost:8098/buckets/ttl_stg/props >> >> >> >> >> {"props":{"allow_mult":false,"backend":"ttl_stg","basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dvv_enabled":false,"dw":"quorum","last_write_wins":false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"ttl_stg","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":"quorum","rw":"quorum","small_vclock":50,"w":"quorum","young_vclock":20}} >> >> >> >> >> >> And used the following statement to PUT test data: >> >> >> >> curl -XPUT localhost:8098/buckets/ttl_stg/keys/1 -d "TEST $(date)" >> >> >> >> After 90 seconds, this is the response I get from Riak: >> >> >> >> root@UBUNTU-12-1:~# curl -XGET localhost:8098/buckets/ttl_stg/keys/1 >> >> not found >> >> >> >> I would carefully check all of the app.config / riak.conf files in >> >> your cluster, the output of "riak config effective" and the bucket >> >> properties for those buckets you expect to be using the memory backend >> >> with TTL. I also recommend using the localhost:8098/buckets/ endpoint >> >> instead of the deprecated riak/ endpoint. >> >> >> >> Please let me know if you have additional questions. >> >> -- >> >> Luke Bakken >> >> Engineer / CSE >> >> lbak...@basho.com >> >> >> >> >> >> On Fri, Oct 3, 2014 at 11:32 AM, Lucas Grijander >> >> <lucasgrinjande...@gmail.com> wrote: >> >> > Hello, >> >> > >> >> > I have a memory backend in production with Riak 2.0.1, 4 servers and >> 256 >> >> > vnodes. The servers have the same date and time. >> >> > >> >> > I have seen an odd performance with the ttl. >> >> > >> >> > This is the config: >> >> > >> >> > {<<"ttl_stg">>,riak_kv_memory_backend, >> >> > [{ttl,90},{max_memory,25}]}, >> >> > >> >> > For example, see this GET response in one of the riak servers: >> >> > >> >> > < HTTP/1.1 200 OK >> >> > < X-Riak-Vclock: a85hYGBgzGDKBVIc4otdfgR/7bfIYEpkzGNlKI1efJYvCwA= >> >> > < Vary: Accept-Encoding >> >> > * Server MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained) >> is >> >> > not >> >> > blacklisted >> >> > < Server: MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained) >> >> > < Link: </riak/ttl_stg>; rel="up" >> >> > < Last-Modified: Fri, 03 Oct 2014 17:40:05 GMT >> >> > < ETag: "3c8bGoifWcOCSVn0otD5nI" >> >> > < Date: Fri, 03 Oct 2014 17:47:50 GMT >> >> > < Content-Type: application/json >> >> > < Content-Length: 17 >> >> > >> >> > If the TTL is 90 seconds, Why the GET doesn't return "not found" if >> the >> >> > difference between "Last-Modified" and "Date" (of the curl request) >> is >> >> > greater than the TTL? >> >> > >> >> > Thanks in advance! >> >> > >> >> > >> >> > _______________________________________________ >> >> > riak-users mailing list >> >> > riak-users@lists.basho.com >> >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> > >> > >> > >> > >> > _______________________________________________ >> > riak-users mailing list >> > riak-users@lists.basho.com >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > >> > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com