Re: Data loads

Reid Draper Thu, 30 Aug 2012 08:31:21 -0700

Welcome to the list Pinney :)

On Aug 30, 2012, at 10:59 AM, Pinney Colton <pinney.col...@bitwisedata.com> 
wrote:


> Hi all -
> 
> This is my first post to the list.  I'm a relative Riak newbie, though I have 
> some experience working with multi-terabyte datasets on other platforms.  
> Last night, I kicked off my first "large" load of data as a test of the 
> platform with about 1,000,000 json objects being loaded into a bucket.  I had 
> a couple performance issues, so I'm wondering if someone on the list could be 
> kind enough to answer a few questions that will help me troubleshoot.
> 
> a) I haven't analyzed all of my load log data yet, but it look like writes 
> went from about 0.02 seconds per object to a couple minutes per object!  This 
> is the typical "dev" setup from the tutorials, and I forgot to divide 
> available RAM by 4 to arrive at a number per node - is this likely the result 
> of a memory constraint, or should I be looking elsewhere, beyond just bumping 
> the memory on my VM?  I looked at the logs, but I'm not sure what I should be 
> looking for.

I'm wondering if you're starting to swap. Have you set swappiness to 0 on the 
machine? If not, I'd recommend that change.

> 
> b) I am using protocol buffers, and I saw similar initial performance when 
> running the load from a separate machine vs. having the data on the riak 
> machine itself.  Is that what you would recommend?  I'm wondering if there is 
> any hard/fast rule re: CPU/Memory contention on the machine vs. network 
> performance of loading from a different machine.

I don't think there is a hard and fast rule here, but I would try doing it over 
the network rather than on the same node.

> 
> c) I'm using a sha256 hash as my bucket name.  I read that buckets and keys 
> are concatenated internally and that all objects have just one "bucketkey".  
> Am I putting significantly more pressure on memory by using such a long 
> bucket name?  Or is Riak managing that for me via some sort of compression?  
> If that long hash is being replicated for each of those million objects, I 
> can see where my memory estimates would have been low.  I can always use an 
> integer ID for my bucket name, the hash just existed elsewhere in my 
> application, so I used it without thinking about it too much.

This depends on the backend you use. Bitcask holds all bucket/keys in memory, 
so their size is important. Leveldb doesn't have this constraint,
and even has key prefix compression.

> 
> Thanks in advance for your help!  Loving Riak so far, in spite of these 
> trivial hurdles.
> 
> Regards,
> Pinney
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Data loads

Reply via email to