Hi Julian,

You will need to update the app.config file and restart the servers in order 
for the changes to take effect.

Best regards,


On 17 May 2013, at 14:05, Julien Genestoux <julien.genest...@gmail.com> wrote:

> Great! Thanks Christian, is that something I can change at runtime or do I 
> have to stop the server?
> Also, would it make sense to change the backend if we have a lot of delete? 
> Thanks,
> On Fri, May 17, 2013 at 2:45 PM, Christian Dahlqvist <christ...@basho.com> 
> wrote:
> Hi Julien,
> I believe from an earlier email that you are using bitcask as a backend. This 
> works with immutable append-only files, and data that is deleted or 
> overwritten will stay in the files and take up disk space until the file is 
> closed and can be merged. The max file size is by default 2GB, but this and 
> other parameters determining how and when merging of closed files is 
> performed can be tuned. Please see 
> http://docs.basho.com/riak/latest/tutorials/choosing-a-backend/Bitcask/ for 
> further details.
> If you wish to reduce the amount of disk space used, you may want to set a 
> smaller max file size in order to allow merging to occur more frequently.
> Best regards,
> Christian
> On 17 May 2013, at 13:06, Julien Genestoux <julien.genest...@gmail.com> wrote:
>> Christian, All
>> Our servers still have not died... but we see another strange behavior: our 
>> data store needs a lot more space that what we expect.
>> Based on the status command, the average size of our object 
>> (node_get_fsm_objsize_mean) is about 1500 bytes.
>> We have 2 buckets, but both of them have a n value of 3. 
>> When we count the values in each of the buckets (using the following 
>> mapreduce)
>> curl -XPOST -H 'Content-Type: 
>> application/json' -d 
>> '{"inputs":"BUCKET","query":[{"reduce":{"language":"erlang","module":"riak_kv_mapreduce","function":"reduce_count_inputs","arg":{"do_prereduce":true}}}],"timeout":
>>  100000}'
>> We get 194556 for one and 1572661 for the other one (these numbers are 
>> consistent with what we expected), so if our math is right, we do need a 
>> total disk of 
>> 3 * (194556 + 1572661 ) * 1500 bytes = 7.4 GB.
>> Now, though, when I inspect the storage actually occupied on our hard 
>> drives, we see something weird:
>> (this is the du output)
>> riak1. 2802888 /var/lib/riak
>> riak2. 4159976 /var/lib/riak
>> riak5. 4603312 /var/lib/riak
>> riak3. 4915180 /var/lib/riak
>> riak4. 37466784  /var/lib/riak
>> As you can see not all nodes have the same "size". What's even weirder is 
>> that up until a couple hours ago, they were all growing "together" and close 
>> to what the riak4 node shows. Could this be due to the "delete" policy? It 
>> turns out that we delete a lot of items (is there a way to get the list of 
>> commands sent to a node/cluster?)
>> Thanks!
>> On Wed, May 15, 2013 at 11:29 PM, Julien Genestoux 
>> <julien.genest...@gmail.com> wrote:
>> Christian, all,
>> Not sure what kind of magic happend, but no server died in the last 2 
>> days... and counting.
>> We have not changed a single line of code, which is quite odd...
>> I'm still monitoring everything and hope (sic!) for a failure soon so we can 
>> fix the problem!
>> Thanks
>> --
>> Got a blog? Make following it simple: https://www.subtome.com/
>> Julien Genestoux,
>> http://twitter.com/julien51
>> +1 (415) 830 6574
>> +33 (0)9 70 44 76 29
>> On Tue, May 14, 2013 at 12:31 PM, Julien Genestoux 
>> <julien.genest...@gmail.com> wrote:
>> Thanks Christian. 
>> We do indeed use mapreduce but it's a fairly simple function:
>> We retrieve a first object whose value is an array of at most 10 ids and 
>> then we fetch all the values for these 10 ids.
>> However, this mapreduce job is quite rare (maybe 10 times a day at most at 
>> this point...) so I don't think that's our issue.
>> I'll try to run the cluster without any call to that to see if that's 
>> better, but I'd be very surprised.  Also, we were doing this already even 
>> before we allowed for multiple value and the cluster was stable back then.
>> We do not do key listing or anything like that.
>> I'll try looking at the statistics too.
>> Thanks,
>> On Tue, May 14, 2013 at 11:50 AM, Christian Dahlqvist <christ...@basho.com> 
>> wrote:
>> Hi Julien,
>> The node appear to have crashed due to inability to allocate memory. How are 
>> you accessing your data? Are you running any key listing or large MapReduce 
>> jobs that could use up a lot of memory?
>> In order to ensure that you are efficiently resolving siblings I would 
>> recommend you monitor the statistics in Riak 
>> (http://docs.basho.com/riak/latest/cookbooks/Statistics-and-Monitoring/). 
>> Specifically look at node_get_fsm_objsize_* and node_get_fsm_siblings_* 
>> statistics in order to identify objects that are very large or have lots of 
>> siblings.
>> Best regards,
>> Christian
>> On 13 May 2013, at 16:44, Julien Genestoux <julien.genest...@gmail.com> 
>> wrote:
>>> Christian, All,
>>> Bad news: my laptop is completely dead. Good news: I have a new one, and 
>>> it's now fully operational (backups FTW!).
>>> The log files have finally been uploaded: 
>>> https://www.dropbox.com/s/j7l3lniu0wogu29/riak-died.tar.gz
>>> I have attached to that mail our config.
>>> The machine is a virtual Xen instance at Linode with 4GB of memory. I know 
>>> it's probably not the very best setup, but 1) we're on a budget and 2) we 
>>> assumed that would fit our needs quite well.
>>> Just to put things in more details. Initially we did not use allow_mult and 
>>> things worked out fine for a couple of days. As soon as we enabled 
>>> allow_mult, we were not able to run the cluster for more then 5 hours 
>>> without seeing failing nodes, which is why I'm convinced we must be doing 
>>> something wrong. The question is: what? 
>>> Thanks
>>> On Sun, May 12, 2013 at 8:07 PM, Christian Dahlqvist <christ...@basho.com> 
>>> wrote:
>>> Hi Julien,
>>> I was not able to access the logs based on the link you provided.
>>> Could you please attach a copy of your app.config file so we can get a 
>>> better understanding of the configuration of your cluster? Also, what is 
>>> the specification of the machines in the cluster?
>>> How much data do you have in the cluster and how are you querying it?
>>> Best regards,
>>> Christian
>>> On 12 May 2013, at 19:11, Julien Genestoux <julien.genest...@gmail.com> 
>>> wrote:
>>>> Hi,
>>>> We are running a cluster of 5 servers, or at least trying to, because 
>>>> nodes seem to be dying 'randomly'
>>>> without us knowing any reason why. We don't have a great Erlang guy 
>>>> aboard, and the error logs are not
>>>> that verbose.
>>>> So I've just .tgz the whole log directory and I was hoping somebody could 
>>>> give us a clue.
>>>> It's there: https://www.dropbox.com/s/z9ezv0qlxgfhcyq/riak-died.tar.gz 
>>>> (might not be fully uploaded to dropbox yet!)
>>>> I've looked at the archive and some people said their server was dying 
>>>> because some object's size was just 
>>>> too big to allocate the whole memory. Maybe that's what we're seeing?
>>>> As one of our buckets is set with allow_mult, I am tempted to think that 
>>>> some object's size may be exploding.
>>>> However, we do actually try to resolve conflicts in our code. Any idea how 
>>>> to confirm and then debug that we 
>>>> have an issue there?
>>>> Thanks a lot for your precious help...
>>>> Julien
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users@lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> <app.config>

riak-users mailing list

Reply via email to