Hi all, I've been experiencing stalls where riak won't return any data (queries time out) with my riak cluster. Here are some basic details:
- 8 nodes - riak 1.4.10 (upgraded from 1.4.6 -> 1.4.8 -> 1.4.10) - leveldb backend - n_val is 2 - allow_mult is false - ec2 i2.2xlarge boxes (8 cores, 61gb ram, 800gb disk space) - about 33% disk space utilization per node The riak cluster will stall for as long as a few minutes at a time, but will otherwise work as expected for hours. There doesn't seem to be an obvious pattern as to when the stalls happen. My first thought was that the stalls may be related to AAE, but I've disabled that via 'riak attach' and the settings file. Sidenote, I still see messages like: 2015-01-05 12:24:04.666 [info] <0.574.0>@riak_kv_entropy_manager:perhaps_log_throttle_change:826 Changing AAE throttle from 0 -> 10 msec/key, based on maximum vnode mailbox size 209 from 'riak-user@riak-host' which makes me question whether AAE is actually turned off. Now I'm leaning towards leveldb compactions being the issue. What can I do to verify this is the issue, and how can I fix it? I see log messages about large objects: 2015-01-05 16:11:28.046 [warning] <0.6398.0>@riak_kv_vnode:encode_and_put_no_sib_check:1830 Writing very large object (11307735 bytes) to <<"BucketName">>/<<"keys_1420466400">> Could these be causing longer-running compactions, or more frequent compactions? Thanks for reading, Andy
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com