Hello! On Centos6.6 find / -name "erl" /usr/lib64/riak/erts-5.10.3/bin/erl and then we get Erlang R16B02_basho8 (erts-5.10.3) [source] [64-it] [smp:24:24] [async-threads:10] [kernel-poll:false] [frame-pointer] Many thanks! Johan Gall -----Original Message----- From: "Nick Marino"<nmar...@basho.com> To: "ジョハンガル"<gall.jo...@linecorp.com>; Cc: "崔榮虎"<youngho.c...@linecorp.com>; "riak-users"<riak-users@lists.basho.com>; Sent: 2015-07-29 (水) 03:18:13 Subject: Re: fresh riak 2.1.1 install systematically slowing down and crashing in ~1day Great! Thanks for your help in verifying this issue. I'll look into getting a patch backported to Basho's Erlang fork. Just to verify, what is the exact version of Erlang you are running? If it seems like it might take a while to get a patched version of Erlang vetted and built, I could potentially build a quick temporary patch that you could try installing on your system, which may fix the problem in the short term. Thanks again,Nick On Tue, Jul 28, 2015 at 12:53 AM, ジョハンガル <gall.jo...@linecorp.com> wrote: Hello! lists:keyfind(size, 1, ets:info(element(2, lists:keyfind(ssl_otp_cacertificate_db, 1, [{ets:info(T, name), T} || T <- ets:all()])))). returns: {size,996816} and it keeps growing and growing So I think we hit our culprit! -----Original Message----- From: "Nick Marino"<nmar...@basho.com> To: "ジョハンガル"<gall.jo...@linecorp.com>; Cc: "riak-users"<riak-users@lists.basho.com>; Sent: 2015-07-28 (火) 02:37:26 Subject: Re: fresh riak 2.1.1 install systematically slowing down and crashing in ~1day Hi, I have a strong suspicion that you're encountering a resource leak bug in the Erlang SSL libraries that Riak uses. By odd coincidence, I ran into a very similar issue at my last job working on a completely different project, and I helped develop a fix a couple of years ago. The patch was accepted somewhere in the Erlang R17 timeframe, but Riak doesn't support R17 yet so you probably don't have a version of Erlang with this particular fix in place. To check whether this resource leak is being hit, you can attach an Erlang shell to a node using the "riak attach" command and copy/paste this one line of code: lists:keyfind(size, 1, ets:info(element(2, lists:keyfind(ssl_otp_cacertificate_db, 1, [{ets:info(T, name), T} || T <- ets:all()])))). Running the above command should give you something that looks like this: {size, 0} You will likely see a number larger than 0, but the general format should be the same. In normal usage, this number should be fairly small, but in your case you may see it continue to grow larger and larger over time (specifically, you may see it incrementing by one for each new incoming SSL connection, and never decrementing, even after connections are closed). If this value starts to get up into the tens or hundreds of thousands or more, establishing new SSL connections will start to get slower and slower, much like you're seeing. If you can verify that this size value continues to grow over time, we can take a look at backporting the relevant fix to our custom Basho fork of Erlang. Let me know what you find, and we can take it from there. Thanks!Nick On Mon, Jul 27, 2015 at 5:59 AM, ジョハンガル <gall.jo...@linecorp.com> wrote: Hello,
I would really appreciate help about our authentication problem. We have developed an orchestration platform for internal cloud needs using RIAK KV 2.1.1 (our first deployment of the technology). Up to now we were only using raw HTTP but for security purposes we have been switching to TLS v1.2 with Protocol Buffer clients (standard riak java client). At first everything works smoothly. Then eventually (a few hours), with a load of about 10-15 authentications by second our cluster CPU usage starts to slowly ramp up to saturation until the the data-store turns unresponsive. (Since we cannot authenticate there are no other requests). Our cluster is currently composed of 2 24 cores xeon machines with 64GB of RAM each, and bonded 2b1Gbps NICS. Running on standard updated Centos6.6. RAM consumption doesn't go over 10%. We are currently storing a few megabytes of data at most. At first: curl -vvv -u **:** https://**** will do the 2 first steps of SSL authentication, CLIENT HELLO and SERVER HELLO and the 3rd message (client receiving CERTIFICATE from server will get slower and slower and slower). Then the ssl handshake in the riak java client will simply timeout. Reading the logs and combining with: https://github.com/basho/riak_api/blob/develop/src/riak_api_pb_server.erl tells me than ssl:ssl_accept never returns (well, it eventually returns with Reason as the atom closed, seemingly the client timeout-ing and closing the connection). supervisor:which_children(whereis(riak_api_pb_sup)) gives me a count of ~7000 processes. etop refuses to start. Have you experienced anything similar? As for our riak.conf configuration: ## Acceptable values: ## - an integer erlang.async_threads = 64 ring_size = 32 .. ssl things setup .. storage_backend = multi ### multi_backend.default = bitcask_99 ### 1h - ephemeral - no safety multi_backend.bitcask_1h.storage_backend = bitcask multi_backend.bitcask_1h.bitcask.expiry = 1h multi_backend.bitcask_1h.bitcask.expiry.grace_time = 1h multi_backend.bitcask_1h.bitcask.data_root = $(platform_data_dir)/bitcask_1h multi_backend.bitcask_1h.bitcask.max_file_size = 2GB multi_backend.bitcask_1h.bitcask.merge.thresholds.fragmentation = 99 ### 3d - ephemeral - a weekend to restart data generator before auto expiry multi_backend.bitcask_3d.storage_backend = bitcask multi_backend.bitcask_3d.bitcask.expiry = 3d multi_backend.bitcask_3d.bitcask.expiry.grace_time = 1h multi_backend.bitcask_3d.bitcask.data_root = $(platform_data_dir)/bitcask_3d multi_backend.bitcask_3d.bitcask.max_file_size = 2GB multi_backend.bitcask_3d.bitcask.merge.thresholds.fragmentation = 99 ### 3m - long term logs multi_backend.bitcask_3m.storage_backend = bitcask multi_backend.bitcask_3m.bitcask.expiry = 3m multi_backend.bitcask_3m.bitcask.expiry.grace_time = 1h multi_backend.bitcask_3m.bitcask.data_root = $(platform_data_dir)/bitcask_3m multi_backend.bitcask_3m.bitcask.max_file_size = 2GB multi_backend.bitcask_3m.bitcask.merge.thresholds.fragmentation = 45 ### persistent - low amount of data multi_backend.bitcask_99.storage_backend = bitcask multi_backend.bitcask_99.bitcask.expiry = off multi_backend.bitcask_99.bitcask.data_root = $(platform_data_dir)/bitcask_99 multi_backend.bitcask_99.bitcask.max_file_size = 128MB multi_backend.bitcask_99.bitcask.merge.thresholds.fragmentation = 20 ### SECURITY RELATED CUSTOM CONF ### tls_protocols.sslv3 = off tls_protocols.tlsv1 = off tls_protocols.tlsv1.1 = on tls_protocols.tlsv1.2 = on secure_referer_check = on honor_cipher_order = on ----------------------------------------- riak-admin security ciphers ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA256:ECDH-RSA-AES256-SHA384:ECDH-ECDSA-AES256-SHA384:AES256-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:DHE-RSA-AES128-SHA256:DHE-DSS-AES128-SHA256:ECDH-RSA-AES128-SHA256:ECDH-ECDSA-AES128-SHA256:AES128-SHA256 _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com