Hi,

I have a strong suspicion that you're encountering a resource leak bug in
the Erlang SSL libraries that Riak uses. By odd coincidence, I ran into a
very similar issue at my last job working on a completely different
project, and I helped develop a fix a couple of years ago. The patch was
accepted somewhere in the Erlang R17 timeframe, but Riak doesn't support
R17 yet so you probably don't have a version of Erlang with this particular
fix in place.

To check whether this resource leak is being hit, you can attach an Erlang
shell to a node using the "riak attach" command and copy/paste this one
line of code:

lists:keyfind(size, 1, ets:info(element(2,
lists:keyfind(ssl_otp_cacertificate_db, 1, [{ets:info(T, name), T} || T <-
ets:all()])))).

Running the above command should give you something that looks like this:

{size, 0}

You will likely see a number larger than 0, but the general format should
be the same. In normal usage, this number should be fairly small, but in
your case you may see it continue to grow larger and larger over time
(specifically, you may see it incrementing by one for each new incoming SSL
connection, and never decrementing, even after connections are closed). If
this value starts to get up into the tens or hundreds of thousands or more,
establishing new SSL connections will start to get slower and slower, much
like you're seeing.

If you can verify that this size value continues to grow over time, we can
take a look at backporting the relevant fix to our custom Basho fork of
Erlang. Let me know what you find, and we can take it from there.

Thanks!
Nick

On Mon, Jul 27, 2015 at 5:59 AM, ジョハンガル <gall.jo...@linecorp.com> wrote:

> Hello,
>
> I would really appreciate help about our authentication problem.
>
> We have developed an orchestration platform for internal cloud needs using
> RIAK KV 2.1.1 (our first deployment of the technology).
> Up to now we were only using raw HTTP but for security purposes we have
> been switching to TLS v1.2 with Protocol Buffer clients (standard riak java
> client).
>
> At first everything works smoothly.
>
>
> Then eventually (a few hours), with a load of about 10-15 authentications
> by second our cluster CPU usage starts to slowly ramp up to saturation
> until the the data-store turns unresponsive. (Since we cannot authenticate
> there are no other requests).
>
>
>
> Our cluster is currently composed of 2 24 cores xeon machines with 64GB of
> RAM each, and bonded 2b1Gbps NICS. Running on standard updated Centos6.6.
> RAM consumption doesn't go over 10%.
> We are currently storing a few megabytes of data at most.
>
> At first:
> curl -vvv -u **:** https://****
> will do the 2 first steps of SSL authentication, CLIENT HELLO and SERVER
> HELLO and the 3rd message (client receiving CERTIFICATE from server will
> get slower and slower and slower).
> Then the ssl handshake in the riak java client will simply timeout.
>
> Reading the logs and combining with:
> https://github.com/basho/riak_api/blob/develop/src/riak_api_pb_server.erl
> tells me than ssl:ssl_accept never returns (well, it eventually returns
> with Reason as the atom closed, seemingly the client timeout-ing and
> closing the connection).
>
>
>
> supervisor:which_children(whereis(riak_api_pb_sup)) gives me a count of
> ~7000 processes.
> etop refuses to start.
>
>
>
> Have you experienced anything similar?
>
> As for our riak.conf configuration:
>
> ## Acceptable values:
> ##   - an integer
> erlang.async_threads = 64
> ring_size = 32
> .. ssl things setup ..
> storage_backend = multi
> ###
> multi_backend.default = bitcask_99
> ### 1h - ephemeral - no safety
> multi_backend.bitcask_1h.storage_backend = bitcask
> multi_backend.bitcask_1h.bitcask.expiry = 1h
> multi_backend.bitcask_1h.bitcask.expiry.grace_time = 1h
> multi_backend.bitcask_1h.bitcask.data_root =
> $(platform_data_dir)/bitcask_1h
> multi_backend.bitcask_1h.bitcask.max_file_size = 2GB
> multi_backend.bitcask_1h.bitcask.merge.thresholds.fragmentation = 99
> ### 3d - ephemeral - a weekend to restart data generator before auto expiry
> multi_backend.bitcask_3d.storage_backend = bitcask
> multi_backend.bitcask_3d.bitcask.expiry = 3d
> multi_backend.bitcask_3d.bitcask.expiry.grace_time = 1h
> multi_backend.bitcask_3d.bitcask.data_root =
> $(platform_data_dir)/bitcask_3d
> multi_backend.bitcask_3d.bitcask.max_file_size = 2GB
> multi_backend.bitcask_3d.bitcask.merge.thresholds.fragmentation = 99
> ### 3m - long term logs
> multi_backend.bitcask_3m.storage_backend = bitcask
> multi_backend.bitcask_3m.bitcask.expiry = 3m
> multi_backend.bitcask_3m.bitcask.expiry.grace_time = 1h
> multi_backend.bitcask_3m.bitcask.data_root =
> $(platform_data_dir)/bitcask_3m
> multi_backend.bitcask_3m.bitcask.max_file_size = 2GB
> multi_backend.bitcask_3m.bitcask.merge.thresholds.fragmentation = 45
> ### persistent - low amount of data
> multi_backend.bitcask_99.storage_backend = bitcask
> multi_backend.bitcask_99.bitcask.expiry = off
> multi_backend.bitcask_99.bitcask.data_root =
> $(platform_data_dir)/bitcask_99
> multi_backend.bitcask_99.bitcask.max_file_size = 128MB
> multi_backend.bitcask_99.bitcask.merge.thresholds.fragmentation = 20
>
> ### SECURITY RELATED CUSTOM CONF ###
>
> tls_protocols.sslv3 = off
> tls_protocols.tlsv1 = off
> tls_protocols.tlsv1.1 = on
> tls_protocols.tlsv1.2 = on
>
> secure_referer_check = on
>
> honor_cipher_order = on
> -----------------------------------------
> riak-admin security ciphers
> ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA256:ECDH-RSA-AES256-SHA384:ECDH-ECDSA-AES256-SHA384:AES256-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:DHE-RSA-AES128-SHA256:DHE-DSS-AES128-SHA256:ECDH-RSA-AES128-SHA256:ECDH-ECDSA-AES128-SHA256:AES128-SHA256
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to