On Mon, May 23, 2011 at 10:53:29PM -0700, Anthony Molinaro wrote: > > On Mon, May 23, 2011 at 09:57:25PM -0600, David Smith wrote: > > On Mon, May 23, 2011 at 9:39 PM, Anthony Molinaro > > Thus, depending on > > your merge triggers, more space can be used than is strictly necessary > > to store the data. > > So the lack of any overhead in the calculation is expected? I mean > according to http://wiki.basho.com/Cluster-Capacity-Planning.html > > Disk = Estimated Total Objects * Average Object Size * n_val > > Which just seems wrong, doesn't it? I don't quite understand the > bitcask code well enough yet to see what the actual data it stores is, > but the whitepaper suggested several things were involved in the on > disk representation.
Okay, finally found the code for this part, I kept looking in the nif but that's only the keydir, not the data files. It looks like %% Setup io_list for writing -- avoid merging binaries if we can help it Bytes0 = [<<Tstamp:?TSTAMPFIELD>>, <<KeySz:?KEYSIZEFIELD>>, <<ValueSz:?VALSIZEFIELD>>, Key, Value], Bytes = [<<(erlang:crc32(Bytes0)):?CRCSIZEFIELD>> | Bytes0], And looking at the header, it seems that there's 14 bytes of overhead (4 for CRC, 4 for timestamp, 2 for keysize, 4 for valsize). So disk calculation should be ( 14 + Key + Value ) * Num Entries * N_Val So using my numbers from before that gives ( 14 + 36 + 36 ) * 183915891 * 3 = 47450299878 = 44.1 GB which actually isn't much closer to 341 GB than the previous calculation :( So all my questions from the previous email still apply. -Anthony -- ------------------------------------------------------------------------ Anthony Molinaro <antho...@alumni.caltech.edu> _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com