Hi Justin, Thanks for the reply. Good to know you may have some partial solutions for the sizings of items. Out use case may long term require us to write out own backend just for space efficiency, but I'm hoping we can make it quite far with bitcask. I've got enough on my plate at the moment so am unlikely to get to this before you guys do, but as I recently forked riak_kv for something else, you never know.
Thanks -Anthony On Wed, May 25, 2011 at 08:10:29PM -0400, Justin Sheehy wrote: > Hi, Anthony. > > There are really three different things below: > > 1- reducing the minimum overhead of the {Bucket, Key} encoding when > riak is storing into bitcask > > 2- reducing the size of the vector clock encoding > > 3- reducing the size of the overall riak_object structure and metadata > > All three of these are worth doing. The reason they are the way they > are now is that the initial assumptions for most Riak deployments was > of a high enough mean object size that these few bytes per object > would proportionally be small noise -- but that's just history and not > a reason to avoid improvements. > > In fact, preliminary work has been done on all three of these. It > just hasn't yet been such a high priority that it got pushed through > to the finish. One tricky part with all three is backward > compatibility, as most production Riak clusters do not expect to need > a full stop every time we want to make an improvement like these. > > Solving #1, by the way, isn't really in bitcask itself but rather in > riak_kv_bitcask_backend. I can take a swing at that (with backward > compatibility) shortly. I might also be able to help dig up some of > the old work on #2 that is nearly a year old, and I think Andy Gross > may have done some of what's needed for #3. > > With less words: I agree, all this should be made smaller. > > And don't let this stop you if you want to jump ahead and give some of it a > try! > > -Justin > > > > On Wed, May 25, 2011 at 1:50 PM, Anthony Molinaro > <antho...@alumni.caltech.edu> wrote: > > > Anyway, things make a lot more sense now, and I'm thinking I may need > > to fork bitcask and get rid of some of that extra overhead. For instance > > 13 bytes of overhead to store a tuple of binaries seems unnecessary, it's > > probably better to just have a single binary with the bucket size as a > > prefix, so something like > > > > <<BucketSize:16,Bucket,Key>> > > > > That way you turn 13 bytes of overhead to 2. > > > > Of course I'd need some way to work with old data, but a one time migration > > shouldn't be too bad. > > > > It also seems like there should be some way to trim down some of that on > > disk usage. I mean 300+ bytes to store 36 bytes is a lot. -- ------------------------------------------------------------------------ Anthony Molinaro <antho...@alumni.caltech.edu> _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com