Hello!
 
On Centos6.6
find / -name "erl"
/usr/lib64/riak/erts-5.10.3/bin/erl
 
and then we get
 
Erlang R16B02_basho8 (erts-5.10.3) [source] [64-it] [smp:24:24] 
[async-threads:10] [kernel-poll:false] [frame-pointer] 
 
 
Many thanks!
Johan Gall 
 
-----Original Message-----
From: "Nick Marino"<nmar...@basho.com> 
To: "ジョハンガル"<gall.jo...@linecorp.com>; 
Cc: "崔榮虎"<youngho.c...@linecorp.com>; 
"riak-users"<riak-users@lists.basho.com>; 
Sent: 2015-07-29 (水) 03:18:13
Subject: Re: fresh riak 2.1.1 install systematically slowing down and crashing 
in ~1day
 
Great! Thanks for your help in verifying this issue. I'll look into getting a 
patch backported to Basho's Erlang fork. Just to verify, what is the exact 
version of Erlang you are running? If it seems like it might take a while to 
get a patched version of Erlang vetted and built, I could potentially build a 
quick temporary patch that you could try installing on your system, which may 
fix the problem in the short term. Thanks again,Nick
On Tue, Jul 28, 2015 at 12:53 AM, ジョハンガル <gall.jo...@linecorp.com> wrote:
 Hello!
 
lists:keyfind(size, 1, ets:info(element(2, 
lists:keyfind(ssl_otp_cacertificate_db, 1, [{ets:info(T, name), T} || T 
<- ets:all()])))).
 
returns:
{size,996816}
and it keeps growing and growing
 
So I think we hit our culprit! 
 
-----Original Message-----
From: "Nick Marino"<nmar...@basho.com> 
To: "ジョハンガル"<gall.jo...@linecorp.com>; 
Cc: "riak-users"<riak-users@lists.basho.com>; 
Sent: 2015-07-28 (火) 02:37:26
Subject: Re: fresh riak 2.1.1 install systematically slowing down and crashing 
in ~1day
 
Hi, I have a strong suspicion that you're encountering a resource leak bug in 
the Erlang SSL libraries that Riak uses. By odd coincidence, I ran into a very 
similar issue at my last job working on a completely different project, and I 
helped develop a fix a couple of years ago. The patch was accepted somewhere in 
the Erlang R17 timeframe, but Riak doesn't support R17 yet so you probably 
don't have a version of Erlang with this particular fix in place. To check 
whether this resource leak is being hit, you can attach an Erlang shell to a 
node using the "riak attach" command and copy/paste this one line of code: 
lists:keyfind(size, 1, ets:info(element(2, 
lists:keyfind(ssl_otp_cacertificate_db, 1, [{ets:info(T, name), T} || T <- 
ets:all()])))). Running the above command should give you something that looks 
like this: {size, 0} You will likely see a number larger than 0, but the 
general format should be the same. In normal usage, this number should be 
fairly small, but in your case you may see it continue to grow larger and 
larger over time (specifically, you may see it incrementing by one for each new 
incoming SSL connection, and never decrementing, even after connections are 
closed). If this value starts to get up into the tens or hundreds of thousands 
or more, establishing new SSL connections will start to get slower and slower, 
much like you're seeing. If you can verify that this size value continues to 
grow over time, we can take a look at backporting the relevant fix to our 
custom Basho fork of Erlang. Let me know what you find, and we can take it from 
there. Thanks!Nick
On Mon, Jul 27, 2015 at 5:59 AM, ジョハンガル <gall.jo...@linecorp.com> wrote: 
Hello,

I would really appreciate help about our authentication problem.

We
 have developed an orchestration platform for internal cloud needs using
 RIAK KV 2.1.1 (our first deployment of the technology).
Up to now we
 were only using raw HTTP but for security purposes we have been 
switching to TLS v1.2 with Protocol Buffer clients (standard riak java 
client).

At first everything works smoothly.

Then 
eventually (a few hours), with a load of about 10-15 authentications by 
second our cluster CPU usage starts to slowly ramp up to saturation 
until the the data-store turns unresponsive. (Since we cannot 
authenticate there are no other requests). 
 
Our cluster is
 currently composed of 2 24 cores xeon machines with 64GB of RAM each, 
and bonded 2b1Gbps NICS. Running on standard updated Centos6.6. RAM 
consumption doesn't go over 10%.
We are currently storing a few megabytes of data at most.

At first:
curl -vvv -u **:** https://****
will
 do the 2 first steps of SSL authentication, CLIENT HELLO and SERVER 
HELLO and the 3rd message (client receiving CERTIFICATE from server will
 get slower and slower and slower).
Then the ssl handshake in the riak java client will simply timeout.

Reading the logs and combining with:
https://github.com/basho/riak_api/blob/develop/src/riak_api_pb_server.erl
tells
 me than ssl:ssl_accept never returns (well, it eventually returns with 
Reason as the atom closed, seemingly the client timeout-ing and closing 
the connection).
 
supervisor:which_children(whereis(riak_api_pb_sup)) gives me a count of ~7000 
processes.
etop refuses to start.
 
Have you experienced anything similar?

As for our riak.conf configuration:
## Acceptable values:
##   - an integer
erlang.async_threads = 64
ring_size = 32
.. ssl things setup ..
storage_backend = multi
###
multi_backend.default = bitcask_99
### 1h - ephemeral - no safety
multi_backend.bitcask_1h.storage_backend = bitcask
multi_backend.bitcask_1h.bitcask.expiry = 1h
multi_backend.bitcask_1h.bitcask.expiry.grace_time = 1h
multi_backend.bitcask_1h.bitcask.data_root = $(platform_data_dir)/bitcask_1h
multi_backend.bitcask_1h.bitcask.max_file_size = 2GB
multi_backend.bitcask_1h.bitcask.merge.thresholds.fragmentation = 99
### 3d - ephemeral - a weekend to restart data generator before auto expiry
multi_backend.bitcask_3d.storage_backend = bitcask
multi_backend.bitcask_3d.bitcask.expiry = 3d
multi_backend.bitcask_3d.bitcask.expiry.grace_time = 1h
multi_backend.bitcask_3d.bitcask.data_root = $(platform_data_dir)/bitcask_3d
multi_backend.bitcask_3d.bitcask.max_file_size = 2GB
multi_backend.bitcask_3d.bitcask.merge.thresholds.fragmentation = 99
### 3m - long term logs
multi_backend.bitcask_3m.storage_backend = bitcask
multi_backend.bitcask_3m.bitcask.expiry = 3m
multi_backend.bitcask_3m.bitcask.expiry.grace_time = 1h
multi_backend.bitcask_3m.bitcask.data_root = $(platform_data_dir)/bitcask_3m
multi_backend.bitcask_3m.bitcask.max_file_size = 2GB
multi_backend.bitcask_3m.bitcask.merge.thresholds.fragmentation = 45
### persistent - low amount of data
multi_backend.bitcask_99.storage_backend = bitcask
multi_backend.bitcask_99.bitcask.expiry = off
multi_backend.bitcask_99.bitcask.data_root = $(platform_data_dir)/bitcask_99
multi_backend.bitcask_99.bitcask.max_file_size = 128MB
multi_backend.bitcask_99.bitcask.merge.thresholds.fragmentation = 20

### SECURITY RELATED CUSTOM CONF ###

tls_protocols.sslv3 = off
tls_protocols.tlsv1 = off
tls_protocols.tlsv1.1 = on
tls_protocols.tlsv1.2 = on

secure_referer_check = on

honor_cipher_order = on
-----------------------------------------
riak-admin
 security ciphers 
ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA256:ECDH-RSA-AES256-SHA384:ECDH-ECDSA-AES256-SHA384:AES256-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:DHE-RSA-AES128-SHA256:DHE-DSS-AES128-SHA256:ECDH-RSA-AES128-SHA256:ECDH-ECDSA-AES128-SHA256:AES128-SHA256
 

 _______________________________________________

riak-users mailing list

riak-users@lists.basho.com

http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


 



 


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to