0 byte images saved to riak cluster

2013-05-21 Thread kzhang
We use a five-node riak cluster to store site images, the codes is written in php using the riak php client library. From time to time, we save the right keys but with 0 byte image files (the image files provided are legit). I checked riak log files but could not find anything relevant. Does anyone

Re: 0 byte images saved to riak cluster

2013-05-22 Thread kzhang
Update -- we found the issue. There were some post requests coming in from a form, obviously not using image content type. We tried: curl -v -s -o/dev/null http://xx:8098/riak/45/21601_7803328_3.jpg --> Fails curl -v -s -o/dev/null -H "Accept: image/*" http://xxx:8098/riak/45/21601_78033

riak 1.4.0 upgrade failed

2013-07-19 Thread kzhang
We were on 1.3.0, I was able to upgrade it to 1.3.2 (sudo rpm -Uvh riak-1.3.2-2.el6.x86_64.rpm). After that, I was trying to upgrade it to 1.4.0 (sudo rpm -Uvh riak-1.4.0-1.el6.x86_64.rpm). I got: error: %pre(riak-1.4.0-1.el6.x86_64) scriptlet failed, exit status 8 error: install: %pre scriptlet

Re: riak 1.4.0 upgrade failed

2013-07-19 Thread kzhang
Thanks! install went through. # sudo rpm -Uvh riak-1.4.0-1.el6.x86_64.rpm Preparing...### [100%] 1:riak warning: /etc/riak/app.config creat ed as /etc/riak/app.config.rpmnew

Re: riak 1.4.0 upgrade failed

2013-07-19 Thread kzhang
Hi, Thanks for the reply. Sorry, I tried a few times, I did stop it at some point, still the same result. Just tried again and here is the result: # riak stop Attempting to restart script through sudo -H -u riak ok # riak-admin status Attempting to restart script through sudo -H -u riak Node is

Re: riak 1.4.0 upgrade failed

2013-07-22 Thread kzhang
That worked! Thanks! Kathleen -- View this message in context: http://riak-users.197444.n3.nabble.com/riak-1-4-0-upgrade-failed-tp4028429p4028493.html Sent from the Riak Users mailing list archive at Nabble.com. ___ riak-users mailing list riak-user

Re: riak 1.4.0 upgrade failed

2013-07-25 Thread kzhang
I upgraded the riak clustered with 5 nodes. Unfortunately, after I upgraded the first node, the other four nodes all went down. I am not sure what caused the downtime. There was a erl_crash.dump on all five, with similar content: =erl_crash_dump:0.1 Wed Jul 24 11:37:56 2013 Slogan: Kernel pid term

Re: riak 1.4.0 upgrade failed

2013-07-26 Thread kzhang
Hi, Yeah, the cluster is running. Once noticing the other four nodes being down, we quickly upgraded and brought them back up. Riak control is running and everything looks good. My only concern is why the other nodes went down while the first node was being upgraded. Thanks! Kathleen -- View

distribution of data among riak cluster physical nodes

2013-10-09 Thread kzhang
We have a 5 node riak cluster to store site images, with N=3, R=1. When we turned off one node, a lot of GET requests failed, which made me think those requested images (3 copies of them) all landed on the failed physical node. Is there a way to tell how the replicas are distributed among the phy

Re: distribution of data among riak cluster physical nodes

2013-10-10 Thread kzhang
In our environment, we have 5 node cluster, N=3, R=1, W=2. We want to maximize read response time. basic_quorum and notfound_ok are their defaults, false and true respectively. To still maintain our current response time when all 5 nodes are running and also to avoid the above scenario where the

Re: distribution of data among riak cluster physical nodes

2013-10-11 Thread kzhang
I read the documentation again (http://docs.basho.com/riak/latest/dev/references/http/fetch-object/). r - (read quorum) how many replicas need to agree when retrieving the object (default is defined by the bucket) pr - how many primary replicas need to be online when doing the read (default is def

Re: distribution of data among riak cluster physical nodes

2013-10-15 Thread kzhang
I think the below setting is what we want to use for our environment > r = 1 > notfound_ok=false > basic_quorum = true > > the client gets notfound if the first two replies are notfound? if the > first > reply is found, the client gets found? if the first reply is notfound, the > second is found, d

tuning number of async threads in Riak's default pool

2013-10-29 Thread kzhang
There is a section in http://docs.basho.com/riak/1.4.0/cookbooks/Linux-Performance-Tuning/: If using LevelDB as the storage backend (which maintains its own I/O thread pool), the number of async threads in Riak's default pool can be decreased in the /etc/riak/vm.args file: +A 16 Can I please kn

Re: tuning number of async threads in Riak's default pool

2013-11-01 Thread kzhang
Does anyone have any insight? -- View this message in context: http://riak-users.197444.n3.nabble.com/tuning-number-of-async-threads-in-Riak-s-default-pool-tp4029628p4029664.html Sent from the Riak Users mailing list archive at Nabble.com. ___ riak-u

Re: tuning number of async threads in Riak's default pool

2013-11-01 Thread kzhang
Thanks! Could you elaborate on it? -- View this message in context: http://riak-users.197444.n3.nabble.com/tuning-number-of-async-threads-in-Riak-s-default-pool-tp4029628p4029666.html Sent from the Riak Users mailing list archive at Nabble.com. ___

Re: LevelDB tuning questions.

2013-11-07 Thread kzhang
I am reading through http://docs.basho.com/riak/latest/ops/advanced/backends/leveldb/ I am a little confused when I read 'cache size' section, where the calculation of cache size is based on 50% of physical memory and the sentence here also seems to indicate to allocate 50% of memory to block cach

Re: LevelDB tuning questions.

2013-11-07 Thread kzhang
Thanks for the reply! So to properly size leveldb memory for riak 1.4, should I just use the 'memory model spreadsheet' attached in the article (http://docs.basho.com/riak/latest/ops/advanced/backends/leveldb/)? -- View this message in context: http://riak-users.197444.n3.nabble.com/LevelDB-t

Re: LevelDB tuning questions.

2013-11-07 Thread kzhang
Thanks again! I have a few questions regarding the new spreadsheet. 'percent reserved', in the spreadsheet, it is set at 10%, we should really use 50%? 'vnode count', it is 'vnode count per server'? E11 (max_open_file, with AAE), shows as a static value '10*4*1024*1024', whereas D11(max_open_fi

Re: LevelDB tuning questions.

2013-11-07 Thread kzhang
I am attaching my calculation based on our environment. Since I am getting negative remainders, so my best bet is to change the number of max_open_file? should I tweak cache_size at all? Copy_of_leveldb_sizing_1_4_(2).xls

Re: LevelDB tuning questions.

2013-11-07 Thread kzhang
Thanks! The cluster has been in production for 4 months. I found this in leveldb log file: 013/10/25-12:43:10.678239 7fb895781700 compacted to: files[ 0 1 5 31 51 0 0 ] 2013/10/29-16:57:26.280633 7fb895781700 compacted to: files[ 0 2 5 31 51 0 0 ] 2013/11/03-11:46:15.935006 7fb895781700 compacted

riak nagios script

2013-12-09 Thread kzhang
I ran the script: https://github.com/basho/riak_nagios/blob/master/src/check_riak_kv_up.erl I got: {critical,"Unable to get list of services running on ~s: ~p", ['xx.xx.xx.xx',{badrpc,nodedown}]} I know riak is running fine on this node. badrpc seems to be the culprit. How do I get a

Re: riak nagios script

2013-12-09 Thread kzhang
Also, when running https://github.com/basho/riak_nagios/blob/master/src/check_node.erl I ran into the error: ** exception error: undefined function getopt:parse/2 in function check_node:main/2 (check_node.erl, line 15) -- View this message in context: http://riak-users.197444.n3.nabb

Re: riak nagios script

2013-12-10 Thread kzhang
Thanks Hector. Here is how I executed the script. I downloaded and installed the erlang shell from http://www.erlang.org/documentation/doc-5.3/doc/getting_started/getting_started.html started erlang OTP: root@MYRIAKNODE otp_src_R16B02]# erl -s toolbar Erlang R16B02 (erts-5.10.3) [source] [64-

Re: riak nagios script

2013-12-10 Thread kzhang
Hi Alex, Thanks. I am completely new to erlang. When googling how to run an erlang program, I came across http://www.erlang.org/documentation/doc-5.3/doc/getting_started/getting_started.html . That's how I got started. To run the script using escript, based on http://www.erlang.org/doc/man/escr

fsm.get time increase on a less busy system

2013-12-17 Thread kzhang
we have a 5 node cluster, in the past few days, all the counters indicate a less busy system (lower cpu, more available memory, less puts and gets, less network traffic), however get time increased. I am not sure what's going on. I checked the logs, nothing jumped to me. -- View this message i

Re: fsm.get time increase on a less busy system

2013-12-17 Thread kzhang
Thanks for the reply. The version is riak 1.4.0. I am looking at node.get.fsm.time.mean and node.get.fsm.time.median. Both increased. -- View this message in context: http://riak-users.197444.n3.nabble.com/fsm-get-time-increase-on-a-less-busy-system-tp4030107p4030112.html Sent from the Riak

Re: fsm.get time increase on a less busy system

2013-12-17 Thread kzhang
Hi Sean, To answer your questions -- yeah, the cluster is read heavy, gets:puts ratio is ~80:1. In the past day, node gets averages 1.4k and node puts averages 20. -- View this message in context: http://riak-users.197444.n3.nabble.com/fsm-get-time-increase-on-a-less-busy-system-tp4030107p403