Great! Thanks for your help in verifying this issue. I'll look into getting
a patch backported to Basho's Erlang fork.

Just to verify, what is the exact version of Erlang you are running? If it
seems like it might take a while to get a patched version of Erlang vetted
and built, I could potentially build a quick temporary patch that you could
try installing on your system, which may fix the problem in the short term.

Thanks again,
Nick

On Tue, Jul 28, 2015 at 12:53 AM, ジョハンガル <gall.jo...@linecorp.com> wrote:

> Hello!
>
>
>
> lists:keyfind(size, 1, ets:info(element(2,
> lists:keyfind(ssl_otp_cacertificate_db, 1, [{ets:info(T, name), T} || T <-
> ets:all()])))).
>
>
>
> returns:
>
> {size,996816}
>
> and it keeps growing and growing
>
>
>
> So I think we hit our culprit!
>
>
>
> -----Original Message-----
> *From:* "Nick Marino"<nmar...@basho.com>
> *To:* "ジョハンガル"<gall.jo...@linecorp.com>;
> *Cc:* "riak-users"<riak-users@lists.basho.com>;
> *Sent:* 2015-07-28 (火) 02:37:26
> *Subject:* Re: fresh riak 2.1.1 install systematically slowing down and
> crashing in ~1day
>
> Hi,
>
> I have a strong suspicion that you're encountering a resource leak bug in
> the Erlang SSL libraries that Riak uses. By odd coincidence, I ran into a
> very similar issue at my last job working on a completely different
> project, and I helped develop a fix a couple of years ago. The patch was
> accepted somewhere in the Erlang R17 timeframe, but Riak doesn't support
> R17 yet so you probably don't have a version of Erlang with this particular
> fix in place.
>
> To check whether this resource leak is being hit, you can attach an Erlang
> shell to a node using the "riak attach" command and copy/paste this one
> line of code:
>
> lists:keyfind(size, 1, ets:info(element(2,
> lists:keyfind(ssl_otp_cacertificate_db, 1, [{ets:info(T, name), T} || T <-
> ets:all()])))).
>
> Running the above command should give you something that looks like this:
>
> {size, 0}
>
> You will likely see a number larger than 0, but the general format should
> be the same. In normal usage, this number should be fairly small, but in
> your case you may see it continue to grow larger and larger over time
> (specifically, you may see it incrementing by one for each new incoming SSL
> connection, and never decrementing, even after connections are closed). If
> this value starts to get up into the tens or hundreds of thousands or more,
> establishing new SSL connections will start to get slower and slower, much
> like you're seeing.
>
> If you can verify that this size value continues to grow over time, we can
> take a look at backporting the relevant fix to our custom Basho fork of
> Erlang. Let me know what you find, and we can take it from there.
>
> Thanks!
> Nick
>
> On Mon, Jul 27, 2015 at 5:59 AM, ジョハンガル <gall.jo...@linecorp.com> wrote:
>
> Hello,
>
> I would really appreciate help about our authentication problem.
>
> We have developed an orchestration platform for internal cloud needs using
> RIAK KV 2.1.1 (our first deployment of the technology).
> Up to now we were only using raw HTTP but for security purposes we have
> been switching to TLS v1.2 with Protocol Buffer clients (standard riak java
> client).
>
> At first everything works smoothly.
>
>
> Then eventually (a few hours), with a load of about 10-15 authentications
> by second our cluster CPU usage starts to slowly ramp up to saturation
> until the the data-store turns unresponsive. (Since we cannot authenticate
> there are no other requests).
>
>
>
> Our cluster is currently composed of 2 24 cores xeon machines with 64GB of
> RAM each, and bonded 2b1Gbps NICS. Running on standard updated Centos6.6.
> RAM consumption doesn't go over 10%.
> We are currently storing a few megabytes of data at most.
>
> At first:
> curl -vvv -u **:** https://****
> will do the 2 first steps of SSL authentication, CLIENT HELLO and SERVER
> HELLO and the 3rd message (client receiving CERTIFICATE from server will
> get slower and slower and slower).
> Then the ssl handshake in the riak java client will simply timeout.
>
> Reading the logs and combining with:
> https://github.com/basho/riak_api/blob/develop/src/riak_api_pb_server.erl
> tells me than ssl:ssl_accept never returns (well, it eventually returns
> with Reason as the atom closed, seemingly the client timeout-ing and
> closing the connection).
>
>
>
> supervisor:which_children(whereis(riak_api_pb_sup)) gives me a count of
> ~7000 processes.
> etop refuses to start.
>
>
>
> Have you experienced anything similar?
>
> As for our riak.conf configuration:
>
> ## Acceptable values:
> ##   - an integer
> erlang.async_threads = 64
> ring_size = 32
> .. ssl things setup ..
> storage_backend = multi
> ###
> multi_backend.default = bitcask_99
> ### 1h - ephemeral - no safety
> multi_backend.bitcask_1h.storage_backend = bitcask
> multi_backend.bitcask_1h.bitcask.expiry = 1h
> multi_backend.bitcask_1h.bitcask.expiry.grace_time = 1h
> multi_backend.bitcask_1h.bitcask.data_root =
> $(platform_data_dir)/bitcask_1h
> multi_backend.bitcask_1h.bitcask.max_file_size = 2GB
> multi_backend.bitcask_1h.bitcask.merge.thresholds.fragmentation = 99
> ### 3d - ephemeral - a weekend to restart data generator before auto expiry
> multi_backend.bitcask_3d.storage_backend = bitcask
> multi_backend.bitcask_3d.bitcask.expiry = 3d
> multi_backend.bitcask_3d.bitcask.expiry.grace_time = 1h
> multi_backend.bitcask_3d.bitcask.data_root =
> $(platform_data_dir)/bitcask_3d
> multi_backend.bitcask_3d.bitcask.max_file_size = 2GB
> multi_backend.bitcask_3d.bitcask.merge.thresholds.fragmentation = 99
> ### 3m - long term logs
> multi_backend.bitcask_3m.storage_backend = bitcask
> multi_backend.bitcask_3m.bitcask.expiry = 3m
> multi_backend.bitcask_3m.bitcask.expiry.grace_time = 1h
> multi_backend.bitcask_3m.bitcask.data_root =
> $(platform_data_dir)/bitcask_3m
> multi_backend.bitcask_3m.bitcask.max_file_size = 2GB
> multi_backend.bitcask_3m.bitcask.merge.thresholds.fragmentation = 45
> ### persistent - low amount of data
> multi_backend.bitcask_99.storage_backend = bitcask
> multi_backend.bitcask_99.bitcask.expiry = off
> multi_backend.bitcask_99.bitcask.data_root =
> $(platform_data_dir)/bitcask_99
> multi_backend.bitcask_99.bitcask.max_file_size = 128MB
> multi_backend.bitcask_99.bitcask.merge.thresholds.fragmentation = 20
>
> ### SECURITY RELATED CUSTOM CONF ###
>
> tls_protocols.sslv3 = off
> tls_protocols.tlsv1 = off
> tls_protocols.tlsv1.1 = on
> tls_protocols.tlsv1.2 = on
>
> secure_referer_check = on
>
> honor_cipher_order = on
> -----------------------------------------
> riak-admin security ciphers
> ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA256:ECDH-RSA-AES256-SHA384:ECDH-ECDSA-AES256-SHA384:AES256-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:DHE-RSA-AES128-SHA256:DHE-DSS-AES128-SHA256:ECDH-RSA-AES128-SHA256:ECDH-ECDSA-AES128-SHA256:AES128-SHA256
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to