Hi,
Sorry for the delay, I've spent a lot of time trying to understand
if the
problem was elsewhere.
I've simplified my infrastructure and got a simple layout that
don't rely
anymore on loadbalancer and also corrected some minor performance
issue on
my workers.
At the moment, i have up to 32 workers that are calling riak for
writes,
each of them are set to :
w=1
dw=0
timeout=1000
using protobuf
a timeouted attempt is rerun 180s later
From my application server perspective, 23% of the calls are
rejected by
timeout (75446 tries, 57564 success, 17578 timeout).
Here is a sample riak-admin stat for one of my 5 hosts:
node_put_fsm_time_100 : 999331
node_put_fsm_time_95 : 773682
node_put_fsm_time_99 : 959444
node_put_fsm_time_mean : 156242
node_put_fsm_time_median : 20235
vnode_put_fsm_time_100 : 5267527
vnode_put_fsm_time_95 : 2437457
vnode_put_fsm_time_99 : 4819538
vnode_put_fsm_time_mean : 175567
vnode_put_fsm_time_median : 6928
I am using leveldb, so i can't tune bitcask backend as suggested.
I've changed the vmdirty settings and enabled them:
admin@riak1:~$ sudo sysctl -a | grep dirtyvm.dirty_background_ratio = 0
vm.dirty_background_bytes = 209715200
vm.dirty_ratio = 40
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 100
vm.dirty_expire_centisecs = 200
I've seen less idle time between writes, iostat is showing near
constant
writes between 20 and 500 kb/s, with some surges around 4000 kb/s.
That's
better, but not that great.
Here is the current configuration for my "activity_fr" bucket type and
"tweet" bucket:
admin@riak1:~$ http localhost:8098/types/activity_fr/props
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 314
Content-Type: application/json
Date: Tue, 03 May 2016 14:30:21 GMT
Server: MochiWeb/1.1 WebMachine/1.10.8 (that head fake, tho)
Vary: Accept-Encoding
{
"props": {
"active": true,
"allow_mult": false,
"basic_quorum": false,
"big_vclock": 50,
"chash_keyfun": {
"fun": "chash_std_keyfun",
"mod": "riak_core_util"
},
"claimant": "r...@riak2.lighthouse-analytics.co",
"dvv_enabled": false,
"dw": "quorum",
"last_write_wins": true,
"linkfun": {
"fun": "mapreduce_linkfun",
"mod": "riak_kv_wm_link_walker"
},
"n_val": 3,
"notfound_ok": true,
"old_vclock": 86400,
"postcommit": [],
"pr": 0,
"precommit": [],
"pw": 0,
"r": "quorum",
"rw": "quorum",
"search_index": "activity_fr.20160422104506",
"small_vclock": 50,
"w": "quorum",
"young_vclock": 20
}
}
admin@riak1:~$ http
localhost:8098/types/activity_fr/buckets/tweet/props
HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 322
Content-Type: application/json
Date: Tue, 03 May 2016 14:30:02 GMT
Server: MochiWeb/1.1 WebMachine/1.10.8 (that head fake, tho)
Vary: Accept-Encoding
{
"props": {
"active": true,
"allow_mult": false,
"basic_quorum": false,
"big_vclock": 50,
"chash_keyfun": {
"fun": "chash_std_keyfun",
"mod": "riak_core_util"
},
"claimant": "r...@riak2.lighthouse-analytics.co",
"dvv_enabled": false,
"dw": "quorum",
"last_write_wins": true,
"linkfun": {
"fun": "mapreduce_linkfun",
"mod": "riak_kv_wm_link_walker"
},
"n_val": 3,
"name": "tweet",
"notfound_ok": true,
"old_vclock": 86400,
"postcommit": [],
"pr": 0,
"precommit": [],
"pw": 0,
"r": "quorum",
"rw": "quorum",
"search_index": "activity_fr.20160422104506",
"small_vclock": 50,
"w": "quorum",
"young_vclock": 20
}
}
I really don't know what to do. Can you help ?
Guillaume