A while ago on this list Nico Meyer did an amazingly detailed job of examining the overhead required when storing data in Riak using Bitcask (http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-May/004292.html). That post inspired me to redo our wiki page, which I just published about an hour ago.
http://wiki.basho.com/Bitcask-Capacity-Planning.html Nico, thanks! The new page provides straight forward recommendations such as: """ To manage your estimated 183.9 million key/bucket pairs where bucket names are ~10 bytes, keys are ~36 bytes, values are ~36 bytes and you are setting aside 16.0 GiB of RAM per-node for in-memory data management within a cluster that is configured to maintain 3 replicas per key (N = 3) then Riak, using the Bitcask storage engine, will require at least 3 nodes each with at least 15.2 GiB of RAM and 120.6 GiB of storage space (361.8 GiB total storage space used across all nodes). """ I'd be interested in feedback as to how close the estimates are to real deployments. The calculator doesn't (yet) take into account overhead due to links, tombstones, search, 2i, etc. all of which will increase the on disk space. Also not accounted for is the way Bitcask files grow until merges are triggered. That's a bit tricky to model, but I think I can project the upper bound. cheers, @gregburd Developer Advocate, Basho Technologies | http://basho.com | @basho
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com