Re: Servers keep dying. How to understand why?

Christian Dahlqvist Thu, 23 May 2013 12:24:57 -0700

Hi Julien,

Changing to LevelDB would probably make sense for your environment as all keys 
will no longer need to be kept in memory. You should therefore be able to 
handle larger amount of data. 4GB is however not a lot of RAM, so the default 
settings will need to be tuned as these generally are optimised for servers 
with more available RAM.


In order to reduce resource consumption with leveldb, I would recommend using a 
much smaller ring size. For a 5 node cluster we usually recommend a minimum 
ring size of 64, but in your case 32 may actually be a more suitable option.

Riak also runs best without swap, so you should ensure that the 4GB of RAM you 
have available are backed by real RAM and not swap space as this will 
negatively affect performance. Also follow the linux tuning guidelines and 
ensure you do not have swap enabled.

As Riak performance also heavily depends on the speed of the disks, ensure 
these are configured to be as fast and efficient as possible.

This is the best advise I have at the moment. If I find any other settings that 
may need to be tweaked I will let you know.

Best regards,

Christian



On 23 May 2013, at 09:23, Julien Genestoux <julien.genest...@gmail.com> wrote:

> Hello Christian, all
> 
> So, after thinking a little bit and reading more we decided to switch to 
> leveldb. For now, we use the default config, but we probably will have to 
> change a couple things in the future.
> 
> We also monitor the whole cluster using collectd and we see some interestng 
> behaviors on which I'd love to have your feedback.
> 
> We monitor the node_[get|put]_fsm_time_* and we noticed that our 100% 
> percentile behaves particularly bad. I have attached a collectd graph for it. 
> (ignore the 'hole' in data collection, as we had to updates a couple thing on 
> the collecting server).
> 
> The mean for get is 1259ms, the average is 3334ms, which is in the same order 
> of magnitude, but the 100% percentile is at 401996ms on average, which is 2 
> orders of magnitude larger. Is that common? We see the exact same thing for 
> put request.
> 
> I am not sure whether this is the cause, but we have a big difference for the 
> sizes of objects. The median and mean are respectively 1545 and 1951 and 
> 32315 for the 100% percentile. (just 1 order of magnitude larger though). 
> Could this be related?
> 
> To investigate this, we have started 'instrumenting' our clients too to see 
> what could be the slow queries, and well, we have not found them! What's even 
> weirder is that for get and out request, we consistently get a responses we 
> see on average latency of 150ms roughly, and we *never* see anything like 
> 402kms! Where could the come from? My idea is that node_[get|put]_fsm_time_* 
> shows more how internal (inside the cluster) get/put work, with n,r and w, 
> but could you confirm? Would you say these values make sense?
> 
> For your reference, we have a cluster of 5 Xen instances (differents hosts), 
> with 4GB of RAM each.
> The cluster currently has 2 buckets:
> - "feeds" which has allow_mult enabled and stores basically arrays of at most 
> 10 elements. Each element is a key for the 2nd bucket. n_val=3. We have about 
> 213,644 objects i this bucket.
> - "entries" which holds complex json objects, allow_mult is not enabled, and 
> the n_val=2. We have about 2,398,908 of these elements.
> 
> As you can see we have a bunch of 'lost' elements, which sucks, but we're not 
> sure what's the best way to deal with them.
> 
> That's it for now :)
> 
> Thanks a lot!
> 
> 
> 
> 
> --
> Got a blog? Make following it simple: https://www.subtome.com/
> 
> Julien Genestoux,
> http://twitter.com/julien51
> 
> +1 (415) 830 6574
> +33 (0)9 70 44 76 29
> 
> 
> On Fri, May 17, 2013 at 3:25 PM, Christian Dahlqvist <christ...@basho.com> 
> wrote:
> Hi Julian,
> 
> You will need to update the app.config file and restart the servers in order 
> for the changes to take effect.
> 
> Best regards,
> 
> Christian
> 
> 
> 
> 
> On 17 May 2013, at 14:05, Julien Genestoux <julien.genest...@gmail.com> wrote:
> 
>> Great! Thanks Christian, is that something I can change at runtime or do I 
>> have to stop the server?
>> Also, would it make sense to change the backend if we have a lot of delete? 
>> 
>> Thanks,
>> 
>> On Fri, May 17, 2013 at 2:45 PM, Christian Dahlqvist <christ...@basho.com> 
>> wrote:
>> Hi Julien,
>> 
>> I believe from an earlier email that you are using bitcask as a backend. 
>> This works with immutable append-only files, and data that is deleted or 
>> overwritten will stay in the files and take up disk space until the file is 
>> closed and can be merged. The max file size is by default 2GB, but this and 
>> other parameters determining how and when merging of closed files is 
>> performed can be tuned. Please see 
>> http://docs.basho.com/riak/latest/tutorials/choosing-a-backend/Bitcask/ for 
>> further details.
>> 
>> If you wish to reduce the amount of disk space used, you may want to set a 
>> smaller max file size in order to allow merging to occur more frequently.
>> 
>> Best regards,
>> 
>> Christian
>> 
>> 
>> 
>> On 17 May 2013, at 13:06, Julien Genestoux <julien.genest...@gmail.com> 
>> wrote:
>> 
>>> Christian, All
>>> 
>>> Our servers still have not died... but we see another strange behavior: our 
>>> data store needs a lot more space that what we expect.
>>> 
>>> Based on the status command, the average size of our object 
>>> (node_get_fsm_objsize_mean) is about 1500 bytes.
>>> We have 2 buckets, but both of them have a n value of 3. 
>>> 
>>> When we count the values in each of the buckets (using the following 
>>> mapreduce)
>>> curl -XPOST http://192.168.134.42:8098/mapred -H 'Content-Type: 
>>> application/json' -d 
>>> '{"inputs":"BUCKET","query":[{"reduce":{"language":"erlang","module":"riak_kv_mapreduce","function":"reduce_count_inputs","arg":{"do_prereduce":true}}}],"timeout":
>>>  100000}'
>>> 
>>> We get 194556 for one and 1572661 for the other one (these numbers are 
>>> consistent with what we expected), so if our math is right, we do need a 
>>> total disk of 
>>> 3 * (194556 + 1572661 ) * 1500 bytes = 7.4 GB.
>>> 
>>> Now, though, when I inspect the storage actually occupied on our hard 
>>> drives, we see something weird:
>>> (this is the du output)
>>> riak1. 2802888 /var/lib/riak
>>> riak2. 4159976 /var/lib/riak
>>> riak5. 4603312 /var/lib/riak
>>> riak3. 4915180 /var/lib/riak
>>> riak4. 37466784  /var/lib/riak
>>> 
>>> As you can see not all nodes have the same "size". What's even weirder is 
>>> that up until a couple hours ago, they were all growing "together" and 
>>> close to what the riak4 node shows. Could this be due to the "delete" 
>>> policy? It turns out that we delete a lot of items (is there a way to get 
>>> the list of commands sent to a node/cluster?)
>>> 
>>> Thanks!
>>> 
>>> 
>>> 
>>> On Wed, May 15, 2013 at 11:29 PM, Julien Genestoux 
>>> <julien.genest...@gmail.com> wrote:
>>> Christian, all,
>>> 
>>> Not sure what kind of magic happend, but no server died in the last 2 
>>> days... and counting.
>>> We have not changed a single line of code, which is quite odd...
>>> I'm still monitoring everything and hope (sic!) for a failure soon so we 
>>> can fix the problem!
>>> 
>>> Thanks
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Got a blog? Make following it simple: https://www.subtome.com/
>>> 
>>> Julien Genestoux,
>>> http://twitter.com/julien51
>>> 
>>> +1 (415) 830 6574
>>> +33 (0)9 70 44 76 29
>>> 
>>> 
>>> On Tue, May 14, 2013 at 12:31 PM, Julien Genestoux 
>>> <julien.genest...@gmail.com> wrote:
>>> Thanks Christian. 
>>> We do indeed use mapreduce but it's a fairly simple function:
>>> We retrieve a first object whose value is an array of at most 10 ids and 
>>> then we fetch all the values for these 10 ids.
>>> However, this mapreduce job is quite rare (maybe 10 times a day at most at 
>>> this point...) so I don't think that's our issue.
>>> I'll try to run the cluster without any call to that to see if that's 
>>> better, but I'd be very surprised.  Also, we were doing this already even 
>>> before we allowed for multiple value and the cluster was stable back then.
>>> We do not do key listing or anything like that.
>>> 
>>> I'll try looking at the statistics too.
>>> 
>>> Thanks,
>>> 
>>> 
>>> 
>>> 
>>> On Tue, May 14, 2013 at 11:50 AM, Christian Dahlqvist <christ...@basho.com> 
>>> wrote:
>>> Hi Julien,
>>> 
>>> The node appear to have crashed due to inability to allocate memory. How 
>>> are you accessing your data? Are you running any key listing or large 
>>> MapReduce jobs that could use up a lot of memory?
>>> 
>>> In order to ensure that you are efficiently resolving siblings I would 
>>> recommend you monitor the statistics in Riak 
>>> (http://docs.basho.com/riak/latest/cookbooks/Statistics-and-Monitoring/). 
>>> Specifically look at node_get_fsm_objsize_* and node_get_fsm_siblings_* 
>>> statistics in order to identify objects that are very large or have lots of 
>>> siblings.
>>> 
>>> Best regards,
>>> 
>>> Christian
>>> 
>>> 
>>> 
>>> On 13 May 2013, at 16:44, Julien Genestoux <julien.genest...@gmail.com> 
>>> wrote:
>>> 
>>>> Christian, All,
>>>> 
>>>> Bad news: my laptop is completely dead. Good news: I have a new one, and 
>>>> it's now fully operational (backups FTW!).
>>>> 
>>>> The log files have finally been uploaded: 
>>>> https://www.dropbox.com/s/j7l3lniu0wogu29/riak-died.tar.gz
>>>> 
>>>> I have attached to that mail our config.
>>>> 
>>>> The machine is a virtual Xen instance at Linode with 4GB of memory. I know 
>>>> it's probably not the very best setup, but 1) we're on a budget and 2) we 
>>>> assumed that would fit our needs quite well.
>>>> 
>>>> Just to put things in more details. Initially we did not use allow_mult 
>>>> and things worked out fine for a couple of days. As soon as we enabled 
>>>> allow_mult, we were not able to run the cluster for more then 5 hours 
>>>> without seeing failing nodes, which is why I'm convinced we must be doing 
>>>> something wrong. The question is: what? 
>>>> 
>>>> Thanks
>>>> 
>>>> 
>>>> On Sun, May 12, 2013 at 8:07 PM, Christian Dahlqvist <christ...@basho.com> 
>>>> wrote:
>>>> Hi Julien,
>>>> 
>>>> I was not able to access the logs based on the link you provided.
>>>> 
>>>> Could you please attach a copy of your app.config file so we can get a 
>>>> better understanding of the configuration of your cluster? Also, what is 
>>>> the specification of the machines in the cluster?
>>>> 
>>>> How much data do you have in the cluster and how are you querying it?
>>>> 
>>>> Best regards,
>>>> 
>>>> Christian
>>>> 
>>>> 
>>>> 
>>>> On 12 May 2013, at 19:11, Julien Genestoux <julien.genest...@gmail.com> 
>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> We are running a cluster of 5 servers, or at least trying to, because 
>>>>> nodes seem to be dying 'randomly'
>>>>> without us knowing any reason why. We don't have a great Erlang guy 
>>>>> aboard, and the error logs are not
>>>>> that verbose.
>>>>> So I've just .tgz the whole log directory and I was hoping somebody could 
>>>>> give us a clue.
>>>>> It's there: https://www.dropbox.com/s/z9ezv0qlxgfhcyq/riak-died.tar.gz 
>>>>> (might not be fully uploaded to dropbox yet!)
>>>>> 
>>>>> I've looked at the archive and some people said their server was dying 
>>>>> because some object's size was just 
>>>>> too big to allocate the whole memory. Maybe that's what we're seeing?
>>>>> 
>>>>> As one of our buckets is set with allow_mult, I am tempted to think that 
>>>>> some object's size may be exploding.
>>>>> However, we do actually try to resolve conflicts in our code. Any idea 
>>>>> how to confirm and then debug that we 
>>>>> have an issue there?
>>>>> 
>>>>> 
>>>>> Thanks a lot for your precious help...
>>>>> 
>>>>> Julien
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> riak-users@lists.basho.com
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>> 
>>>> 
>>>> <app.config>
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> <fsm_time.png>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Servers keep dying. How to understand why?

Reply via email to