Re: Very slow acquisition time (99 percentile) while fast median times

2016-05-04 Thread Guillaume Boddaert
Thanks, I've installed the new library as stated in the documentation 
using 2.0.18 files.


I was unable to find the vnode LOG file from the documentation, as my 
vnodes looks like file, not directories. So I can't verify that I run 
the proper version of the library after my riak restart.


Anyway, it has unfortunately no effect:
http://www.awesomescreenshot.com/image/1219821/1b292613c051da86df5696034c114b14

I think i'll try to add a 6th node that don't rely on network disks and 
see what's going on.


G.


On 03/05/2016 22:47, Matthew Von-Maszewski wrote:

Guillaume,

A prebuilt eleveldb 2.0.18 for Debian 7 is found here:

  * 
https://s3.amazonaws.com/downloads.basho.com/patches/eleveldb/2.0.18/eleveldb_2.0.18_debian7.tgz


There are good instructions for applying an eleveldb patch here:

http://docs.basho.com/community/productadvisories/leveldbsegfault/#patch-eleveldb-so

Key points about the above web page:

- use the eleveldb patch file link in this email, NOT links on the web 
page


- the Debian directory listed on the web page will be slightly 
different than your Riak 2.1.4 installation:

/usr/lib/riak/lib/eleveldb-/priv/


Matthew


On May 3, 2016, at 1:01 PM, Matthew Von-Maszewski > wrote:


Guillaume,

I have reviewed the debug package for your riak1 server.  There are 
two potential areas of follow-up:


1.  You are running our most recent Riak 2.1.4 which has eleveldb 
2.0.17.  We have seen a case where a recent feature in eleveldb 
2.0.17 caused too much cache flushing, impacting leveldb’s 
performance.  A discussion is here:


https://github.com/basho/leveldb/wiki/mv-timed-grooming2

2.  Yokozuna search was recently updated for some timeout problems. 
 Those updates are not yet in a public build.  One of our other 
engineers is likely to respond to you on that topic.



An eleveldb 2.0.18 is tagged and available via github if you want to 
build it yourself.  Otherwise, Basho may be releasing prebuilt 
patches of eleveldb 2.0.18 in the near future.  But no date is 
currently set.


Matthew

On May 3, 2016, at 10:50 AM, Luke Bakken > wrote:


Guillaume -

You said earlier "My data are stored on an openstack volume that
support up to 3000IOPS". There is a likelihood that your write load is
exceeding the capacity of your virtual environment, especially if some
Riak nodes are sharing physical disk or server infrastructure.

Some suggestions:

* If you're not using Riak Search, set "search = off" in riak.conf

* Be sure to carefully read and apply all tunings:
http://docs.basho.com/riak/kv/2.1.4/using/performance/

* You may wish to increase the memory dedicated to leveldb:
http://docs.basho.com/riak/kv/2.1.4/configuring/backend/#leveldb

--
Luke Bakken
Engineer
lbak...@basho.com


On Tue, May 3, 2016 at 7:33 AM, Guillaume Boddaert
 wrote:

Hi,

Sorry for the delay, I've spent a lot of time trying to understand 
if the

problem was elsewhere.
I've simplified my infrastructure and got a simple layout that 
don't rely
anymore on loadbalancer and also corrected some minor performance 
issue on

my workers.

At the moment, i have up to 32 workers that are calling riak for 
writes,

each of them are set to :
w=1
dw=0
timeout=1000
using protobuf
a timeouted attempt is rerun 180s later

From my application server perspective, 23% of the calls are 
rejected by

timeout (75446 tries, 57564 success, 17578 timeout).

Here is a sample riak-admin stat for one of my 5 hosts:

node_put_fsm_time_100 : 999331
node_put_fsm_time_95 : 773682
node_put_fsm_time_99 : 959444
node_put_fsm_time_mean : 156242
node_put_fsm_time_median : 20235
vnode_put_fsm_time_100 : 5267527
vnode_put_fsm_time_95 : 2437457
vnode_put_fsm_time_99 : 4819538
vnode_put_fsm_time_mean : 175567
vnode_put_fsm_time_median : 6928

I am using leveldb, so i can't tune bitcask backend as suggested.

I've changed the vmdirty settings and enabled them:
admin@riak1:~$ sudo sysctl -a | grep dirtyvm.dirty_background_ratio = 0
vm.dirty_background_bytes = 209715200
vm.dirty_ratio = 40
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 100
vm.dirty_expire_centisecs = 200

I've seen less idle time between writes, iostat is showing near 
constant
writes between 20 and 500 kb/s, with some surges around 4000 kb/s. 
That's

better, but not that great.

Here is the current configuration for my "activity_fr" bucket type and
"tweet" bucket:


admin@riak1:~$ http localhost:8098/types/activity_fr/props
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 314
Content-Type: application/json
Date: Tue, 03 May 2016 14:30:21 GMT
Server: MochiWeb/1.1 WebMachine/1.10.8 (that head fake, tho)
Vary: Accept-Encoding
{
   "props": {
   "active": true,
   "allow_mult": false,
   "basic_quorum": false,
   "big_vclock": 50,
   "chash_keyfun": {
   "fun": "chash_std_keyfun",
   "mod": "riak_core_util"
   },
   "claimant": "r...@riak2.lighthouse-analytics.co",
   "dvv_enabled": false,

Re: Very slow acquisition time (99 percentile) while fast median times

2016-05-04 Thread Matthew Von-Maszewski
Guillaume,

Two points:

1.  You can send the “riak debug” from one server and I will verify that 2.0.18 
is indicated in the LOG file.

2.  Your previous “riak debug” from server “riak1” indicated that only two CPU 
cores existed.  We performance test with eight, twelve, and twenty-four core 
servers, not two.  You have two heavy weight applications, Riak and Solr, 
competing for time on those two cores.  Actually, you have three applications 
due to leveldb’s background compaction operations.

One leveldb compaction is CPU intensive.  The compaction reads a block from the 
disk, computes a CRC32 check of the block, decompresses the block, merges the 
keys of this block with one or more blocks from other files, then compresses 
the new block, computes a new CRC32, and finally writes the block to disk.  And 
there can be multiple compactions running simultaneously.  All of your CPU time 
could be periodically lost to leveldb compactions.

There are some minor tunings we could do, like disabling compression in 
leveldb, that might help.  But I seriously doubt you are going to achieve your 
desired results with only two cores.  Adding a sixth server with two cores is 
not really going to help.

Matthew


> On May 4, 2016, at 4:27 AM, Guillaume Boddaert 
>  wrote:
> 
> Thanks, I've installed the new library as stated in the documentation using 
> 2.0.18 files.
> 
> I was unable to find the vnode LOG file from the documentation, as my vnodes 
> looks like file, not directories. So I can't verify that I run the proper 
> version of the library after my riak restart.
> 
> Anyway, it has unfortunately no effect:
> http://www.awesomescreenshot.com/image/1219821/1b292613c051da86df5696034c114b14
>  
> 
> 
> I think i'll try to add a 6th node that don't rely on network disks and see 
> what's going on.
> 
> G.
> 
> 
> On 03/05/2016 22:47, Matthew Von-Maszewski wrote:
>> Guillaume,
>> 
>> A prebuilt eleveldb 2.0.18 for Debian 7 is found here:
>>
>>  
>> https://s3.amazonaws.com/downloads.basho.com/patches/eleveldb/2.0.18/eleveldb_2.0.18_debian7.tgz
>>  
>> 
>> 
>> There are good instructions for applying an eleveldb patch here:
>> 
>>   
>> http://docs.basho.com/community/productadvisories/leveldbsegfault/#patch-eleveldb-so
>>  
>> 
>> 
>> Key points about the above web page:
>> 
>> - use the eleveldb patch file link in this email, NOT links on the web page
>> 
>> - the Debian directory listed on the web page will be slightly different 
>> than your Riak 2.1.4 installation:
>> 
>> /usr/lib/riak/lib/eleveldb-/priv/
>> 
>> 
>> Matthew
>> 
>> 
>>> On May 3, 2016, at 1:01 PM, Matthew Von-Maszewski >> > wrote:
>>> 
>>> Guillaume,
>>> 
>>> I have reviewed the debug package for your riak1 server.  There are two 
>>> potential areas of follow-up:
>>> 
>>> 1.  You are running our most recent Riak 2.1.4 which has eleveldb 2.0.17.  
>>> We have seen a case where a recent feature in eleveldb 2.0.17 caused too 
>>> much cache flushing, impacting leveldb’s performance.  A discussion is here:
>>> 
>>>   https://github.com/basho/leveldb/wiki/mv-timed-grooming2 
>>> 
>>> 
>>> 2.  Yokozuna search was recently updated for some timeout problems.  Those 
>>> updates are not yet in a public build.  One of our other engineers is 
>>> likely to respond to you on that topic.
>>> 
>>> 
>>> An eleveldb 2.0.18 is tagged and available via github if you want to build 
>>> it yourself.  Otherwise, Basho may be releasing prebuilt patches of 
>>> eleveldb 2.0.18 in the near future.  But no date is currently set.
>>> 
>>> Matthew
>>> 
 On May 3, 2016, at 10:50 AM, Luke Bakken >>> > wrote:
 
 Guillaume -
 
 You said earlier "My data are stored on an openstack volume that
 support up to 3000IOPS". There is a likelihood that your write load is
 exceeding the capacity of your virtual environment, especially if some
 Riak nodes are sharing physical disk or server infrastructure.
 
 Some suggestions:
 
 * If you're not using Riak Search, set "search = off" in riak.conf
 
 * Be sure to carefully read and apply all tunings:
 http://docs.basho.com/riak/kv/2.1.4/using/performance/ 
 
 
 * You may wish to increase the memory dedicated to leveldb:
 http://docs.basho.com/riak/kv/2.1.4/configuring/backend/#leveldb 
 

Riak-cs / stanchion won't find credentials

2016-05-04 Thread Jhonny Everson
Hi,

I am setting up a new cluster. I followed all the setup instructions (I
think). I created admin user as doc says, then updated riak-cs.conf and
stanchion.conf with generated keys. I get the following when starting:

2016-05-05 01:15:01.167 [error]
> <0.149.0>@riak_cs_app:fetch_and_cache_admin_creds:96 Couldn't get admin
> user (LMTLWU8QZ_UZZJ4Y541) record: {error,notfound}
> 2016-05-05 01:15:01.199 [error] <0.149.0>@riak_cs_app:sanity_check:129
> Admin credentials are not properly set: notfound.


If I revert back to default ('admin.key = admin-key'), then it starts OK.
If I try to create the user again, it says the email already exists. So
it's there.

I looked at the logs and didn't find anything that seems relevant other
these entries I just posted. Can someone please help me dig into this issue?

-- 
Jhonny Everson
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Riak-cs / stanchion won't find credentials

2016-05-04 Thread Jhonny Everson
Also, just for reference I am using last riak/riak-cs/stanchion available
on packagecloud

On Wed, May 4, 2016 at 11:34 PM, Jhonny Everson  wrote:

> Hi,
>
> I am setting up a new cluster. I followed all the setup instructions (I
> think). I created admin user as doc says, then updated riak-cs.conf and
> stanchion.conf with generated keys. I get the following when starting:
>
> 2016-05-05 01:15:01.167 [error]
>> <0.149.0>@riak_cs_app:fetch_and_cache_admin_creds:96 Couldn't get admin
>> user (LMTLWU8QZ_UZZJ4Y541) record: {error,notfound}
>> 2016-05-05 01:15:01.199 [error] <0.149.0>@riak_cs_app:sanity_check:129
>> Admin credentials are not properly set: notfound.
>
>
> If I revert back to default ('admin.key = admin-key'), then it starts OK.
> If I try to create the user again, it says the email already exists. So
> it's there.
>
> I looked at the logs and didn't find anything that seems relevant other
> these entries I just posted. Can someone please help me dig into this issue?
>
> --
> Jhonny Everson
>



-- 
Jhonny Everson
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com