A while ago on this list Nico Meyer did an amazingly detailed job of examining 
the overhead required when storing data in Riak using Bitcask 
(http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-May/004292.html).
  That post inspired me to redo our wiki page, which I just published about an 
hour ago.

http://wiki.basho.com/Bitcask-Capacity-Planning.html

Nico, thanks!

The new page provides straight forward recommendations such as:

"""
To manage your estimated 183.9 million key/bucket pairs where bucket names are 
~10 bytes, keys are ~36 bytes, values are ~36 bytes and you are setting aside 
16.0 GiB of RAM per-node for in-memory data management within a cluster that is 
configured to maintain 3 replicas per key (N = 3) then Riak, using the Bitcask 
storage engine, will require at least 3 nodes each with at least 15.2 GiB of 
RAM and 120.6 GiB of storage space (361.8 GiB total storage space used across 
all nodes).

"""

I'd be interested in feedback as to how close the estimates are to real 
deployments.  The calculator doesn't (yet) take into account overhead due to 
links, tombstones, search, 2i, etc. all of which will increase the on disk 
space.  Also not accounted for is the way Bitcask files grow until merges are 
triggered.  That's a bit tricky to model, but I think I can project the upper 
bound.

cheers,

@gregburd
Developer Advocate, Basho Technologies | http://basho.com | @basho
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to