Great! Thanks for your help in verifying this issue. I'll look into getting
a patch backported to Basho's Erlang fork.

Just to verify, what is the exact version of Erlang you are running? If it
seems like it might take a while to get a patched version of Erlang vetted
and built, I could potentially build a quick temporary patch that you could
try installing on your system, which may fix the problem in the short term.

> Hello!
> lists:keyfind(size, 1, ets:info(element(2,
> lists:keyfind(ssl_otp_cacertificate_db, 1, [{ets:info(T, name), T} || T <-
> ets:all()])))).
> returns:
> {size,996816}
> and it keeps growing and growing
> So I think we hit our culprit!
> Hi,
> I have a strong suspicion that you're encountering a resource leak bug in
> the Erlang SSL libraries that Riak uses. By odd coincidence, I ran into a
> very similar issue at my last job working on a completely different
> project, and I helped develop a fix a couple of years ago. The patch was
> accepted somewhere in the Erlang R17 timeframe, but Riak doesn't support
> R17 yet so you probably don't have a version of Erlang with this particular
> fix in place.
> To check whether this resource leak is being hit, you can attach an Erlang
> shell to a node using the "riak attach" command and copy/paste this one
> line of code:
> lists:keyfind(size, 1, ets:info(element(2,
> lists:keyfind(ssl_otp_cacertificate_db, 1, [{ets:info(T, name), T} || T <-
> ets:all()])))).
> Running the above command should give you something that looks like this:
> {size, 0}
> You will likely see a number larger than 0, but the general format should
> be the same. In normal usage, this number should be fairly small, but in
> your case you may see it continue to grow larger and larger over time
> (specifically, you may see it incrementing by one for each new incoming SSL
> connection, and never decrementing, even after connections are closed). If
> this value starts to get up into the tens or hundreds of thousands or more,
> establishing new SSL connections will start to get slower and slower, much
> like you're seeing.
> If you can verify that this size value continues to grow over time, we can
> take a look at backporting the relevant fix to our custom Basho fork of
> Erlang. Let me know what you find, and we can take it from there.
> Hello,
> I would really appreciate help about our authentication problem.
> We have developed an orchestration platform for internal cloud needs using
> RIAK KV 2.1.1 (our first deployment of the technology).
> Up to now we were only using raw HTTP but for security purposes we have
> been switching to TLS v1.2 with Protocol Buffer clients (standard riak java
> client).
> At first everything works smoothly.
> Then eventually (a few hours), with a load of about 10-15 authentications
> by second our cluster CPU usage starts to slowly ramp up to saturation
> until the the data-store turns unresponsive. (Since we cannot authenticate
> there are no other requests).
> Our cluster is currently composed of 2 24 cores xeon machines with 64GB of
> RAM each, and bonded 2b1Gbps NICS. Running on standard updated Centos6.6.
> RAM consumption doesn't go over 10%.
> We are currently storing a few megabytes of data at most.
> At first:
> curl -vvv -u **:** https://****
> will do the 2 first steps of SSL authentication, CLIENT HELLO and SERVER
> HELLO and the 3rd message (client receiving CERTIFICATE from server will
> get slower and slower and slower).
> Then the ssl handshake in the riak java client will simply timeout.
> Reading the logs and combining with:
> tells me than ssl:ssl_accept never returns (well, it eventually returns
> with Reason as the atom closed, seemingly the client timeout-ing and
> closing the connection).
> supervisor:which_children(whereis(riak_api_pb_sup)) gives me a count of
> ~7000 processes.
> etop refuses to start.
> Have you experienced anything similar?
> As for our riak.conf configuration:
> ## Acceptable values:
> ##   - an integer
> erlang.async_threads = 64
> ring_size = 32
> .. ssl things setup ..
> storage_backend = multi
> ###
> multi_backend.default = bitcask_99
> ### 1h - ephemeral - no safety
> multi_backend.bitcask_1h.storage_backend = bitcask
> multi_backend.bitcask_1h.bitcask.expiry = 1h
> multi_backend.bitcask_1h.bitcask.expiry.grace_time = 1h
> multi_backend.bitcask_1h.bitcask.data_root =
> $(platform_data_dir)/bitcask_1h
> multi_backend.bitcask_1h.bitcask.max_file_size = 2GB
> multi_backend.bitcask_1h.bitcask.merge.thresholds.fragmentation = 99
> ### 3d - ephemeral - a weekend to restart data generator before auto expiry
> multi_backend.bitcask_3d.storage_backend = bitcask
> multi_backend.bitcask_3d.bitcask.expiry = 3d
> multi_backend.bitcask_3d.bitcask.expiry.grace_time = 1h
> multi_backend.bitcask_3d.bitcask.data_root =
> $(platform_data_dir)/bitcask_3d
> multi_backend.bitcask_3d.bitcask.max_file_size = 2GB
> multi_backend.bitcask_3d.bitcask.merge.thresholds.fragmentation = 99
> ### 3m - long term logs
> multi_backend.bitcask_3m.storage_backend = bitcask
> multi_backend.bitcask_3m.bitcask.expiry = 3m
> multi_backend.bitcask_3m.bitcask.expiry.grace_time = 1h
> multi_backend.bitcask_3m.bitcask.data_root =
> $(platform_data_dir)/bitcask_3m
> multi_backend.bitcask_3m.bitcask.max_file_size = 2GB
> multi_backend.bitcask_3m.bitcask.merge.thresholds.fragmentation = 45
> ### persistent - low amount of data
> multi_backend.bitcask_99.storage_backend = bitcask
> multi_backend.bitcask_99.bitcask.expiry = off
> multi_backend.bitcask_99.bitcask.data_root =
> $(platform_data_dir)/bitcask_99
> multi_backend.bitcask_99.bitcask.max_file_size = 128MB
> multi_backend.bitcask_99.bitcask.merge.thresholds.fragmentation = 20
> tls_protocols.sslv3 = off
> tls_protocols.tlsv1 = off
> tls_protocols.tlsv1.1 = on
> tls_protocols.tlsv1.2 = on
> secure_referer_check = on
> honor_cipher_order = on
> -----------------------------------------
> riak-admin security ciphers
