Re: bitcask hash algo
On Tue, Jun 7, 2011 at 11:39 PM, Aaron Blohowiak wrote: > as far as i can tell, bitcask c_src is using the murmurhash2 algo, which has > a known flaw ( https://sites.google.com/site/murmurhash/murmurhash2flaw ).. > while this is not *likely* to cause an issue, I was wondering if there was a > reason that it does not use murmurhash3 ? > if this is not the appropriate list for this question, please let me know! The reason is simple -- murmurhash3 hasn't been released yet (that I can see). :) Once there is a final version of m3, we'll certainly upgrade. Thanks for the pointer. D. -- Dave Smith Director, Engineering Basho Technologies, Inc. diz...@basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
RFC: riak-python-client: Added a class for constructing key filters
Hi I added a class to the Python Riak client to hopefully help build key filters easier. I wanted to solicit the opinion of other Python Riak develops to ensure that it smells right. If you have a moment, add your 2 cents to the pull request's comment thread here: https://github.com/basho/riak-python-client/pull/29#issuecomment-1325131 ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: A script to check bitcask keydir sizes
On Thu, Mar 24, 2011 at 1:51 PM, Nico Meyer wrote: > The bigger concern for me would be the way the bucket/key tuple is > serialized: > > Eshell V5.8 (abort with ^G) > 1> iolist_size(term_to_binary({<<>>,<<>>})). > 13 > > That's 13 bytes of overhead per key were only 2 bytes is needed with > reasonable bucket/key length limits of 256 bytes each. Or if that is not > enough, one could also use a variable length encoding, so bucket/keys > can be arbitrarily large and the most common cases (less then 128 bytes) > still only use 2 bytes of overhead. I've made a branch of bitcask that effectively does this. It uses 3 bytes per record instead of 13, saving 10 bytes (both in RAM and on disk) per element stored. The tricky thing, however, is backward compatibility. There are many Riak installations out there with data stored in bitcask using the old key encoding, and we shouldn't force them all to do a very costly full-sweep of their existing data in order to get these savings. When we sort out the best way to manage a smooth upgrade, I would happily push out the smaller encoding. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: A script to check bitcask keydir sizes
Hi Justin, I wanted to write this earlier, but I just had to much on my plate: Am 08.06.2011 16:11, schrieb Justin Sheehy: On Thu, Mar 24, 2011 at 1:51 PM, Nico Meyer wrote: The bigger concern for me would be the way the bucket/key tuple is serialized: Eshell V5.8 (abort with ^G) 1> iolist_size(term_to_binary({<<>>,<<>>})). 13 That's 13 bytes of overhead per key were only 2 bytes is needed with reasonable bucket/key length limits of 256 bytes each. Or if that is not enough, one could also use a variable length encoding, so bucket/keys can be arbitrarily large and the most common cases (less then 128 bytes) still only use 2 bytes of overhead. I've made a branch of bitcask that effectively does this. It uses 3 bytes per record instead of 13, saving 10 bytes (both in RAM and on disk) per element stored. The tricky thing, however, is backward compatibility. There are many Riak installations out there with data stored in bitcask using the old key encoding, and we shouldn't force them all to do a very costly full-sweep of their existing data in order to get these savings. When we sort out the best way to manage a smooth upgrade, I would happily push out the smaller encoding. I think the possible gains of this change are fairly limited. Shaving of about 10 bytes per key compared to 43 bytes of overhead plus lets say at least 10 bytes for bucket and key combined is already less than 20 percent savings. The saving seems even smaller if you consider the overhead imposed by the memory allocator. I wrote a small test program in C++ which allocates one million blocks of memory of a given size and prints the overhead for each allocation. Turns out the overhead ranges from 8 to 23 bytes in a sawtooth like pattern (on a 64bit Linux machine): size=56: overhead=8 size=57: overhead=23 size=58: overhead=22 size=59: overhead=21 size=60: overhead=20 size=61: overhead=19 size=62: overhead=18 size=63: overhead=17 size=64: overhead=16 size=65: overhead=15 size=66: overhead=14 size=67: overhead=13 size=68: overhead=12 size=69: overhead=11 size=70: overhead=10 size=71: overhead=9 size=72: overhead=8 Not much you can do about that, unless one wants to use unaligned memory, which one doesn't. -Justin Cheers, Nico ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Bundler Version Requirement Problem
All - I'm getting the following error when running my rspecs for my project: > /Library/Ruby/Site/1.8/rubygems.rb:274:in `activate': can't activate > builder (~> 2.1.2, runtime) for ["riak-client-0.9.4"], already activated > builder-3.0.0 for ["savon-0.9.1"] (Gem::LoadError) Would it be possible to remove the riak client's requirement not to exceed version 2.1? Thanks, Keith ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Erlang API
Hi List, I'm new to riak, and I am thinking of using riak to store log file / statistics information from a client application. The main benefit riak could offer here are its map-reduce/search capabilities. The client side applications are developed in a variety of languages, but in the end all requests will be tunneled via a light weight API to our own Erlang "admin" application. I've noticed that the riak Erlang API really dispatches requests to the riak cluster via the gen_tcp module. The logging (insert) rate could be very high, so what would be neat is if I could embed our Erlang "admin" application directly into the riak cluster to avoid the IPC hop between the riak-erlang-client node and the riak node(s). Now obviously I can fork the code and implement my own way of doing this, but the question is there an official way to embed your own code directly into a riak cluster to avoid that extra IPC hop? Thanks Matt ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Fwd: Bundler Version Requirement Problem
All - This may not be an issue. If so, I apologize for wasting your time. I'm doing further research. - Keith Begin forwarded message: > From: Keith Bennett > Date: June 8, 2011 11:24:43 AM EDT > To: riak-users users > Subject: Bundler Version Requirement Problem > > All - > > I'm getting the following error when running my rspecs for my project: > > >> /Library/Ruby/Site/1.8/rubygems.rb:274:in `activate': can't activate >> builder (~> 2.1.2, runtime) for ["riak-client-0.9.4"], already activated >> builder-3.0.0 for ["savon-0.9.1"] (Gem::LoadError) > > > Would it be possible to remove the riak client's requirement not to exceed > version 2.1? > > Thanks, > Keith > > > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: A script to check bitcask keydir sizes
> Am 08.06.2011 16:11, schrieb Justin Sheehy: > The saving seems even smaller if you consider the overhead imposed by the > memory allocator. I wrote a small test program in C++ which allocates one > million blocks of memory of a given size and prints the overhead for each > allocation. Turns out the overhead ranges from 8 to 23 bytes in a sawtooth > like pattern (on a 64bit Linux machine): > --snip > > Not much you can do about that, unless one wants to use unaligned memory, > which one doesn't. Memory pools. Preallocation+slicing. Games to be played. -mox ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Process for acceptance of pull requests
Hi, I had a pretty minor pull request against riak_kv https://github.com/basho/riak_kv/pull/105 which I sent a few weeks ago, however, I've not seen any comments or anything. So I wanted to understand a little better what the process for this sort of thing should be. Is a github pull request the preferred mode of communication, or should we also send emails or open bugs in the bugtracker? Maybe the process is spelled out somewhere and I'm just missing it? Or maybe the process is just to submit a pull request and wait and I'm impatient ;) Thanks for the info, -Anthony -- Anthony Molinaro ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Pruning (merging) after storage reaches a certain size?
Hi, Steve. Check out this page: http://wiki.basho.com/Bitcask-Configuration.html#Disk-Usage-and-Merging-Settings Basically, a "merge trigger" must be met in order to have the merge process occur. When it does occur, it will affect all existing files that meet a "merge threshold." One note that is relevant for your specific use: the expiry_secs parameter will cause a given item to disappear from the client API immediately after expiry, and to be cleaned if it is in a file already being merged, but will not currently contribute toward merge triggers or thresholds on its own if not otherwise "dead". -Justin On Jun 7, 2011, at 4:29 PM, Steve Webb wrote: > Hello there. > > I'm loading a 2-node (1GB mem, 20GB storage, vmware VMs) riaksearch cluster > with the spritzer twitter feed. I used the bitcask 'expiry_secs' to expire > data after 3 days. > > I'm curious - I'm up to about 10GB of storage and I'm guessing that I'll be > full in 3-4 more days of ingesting data. I have no idea if/when a merge will > run to expire the older data. > > Q: Is there a method or command to force a merge at any time? > Q: Is there a way to run a merge when the storage size reaches a specific > threshold? > > - Steve > > -- > Steve Webb - Senior System Administrator for gnip.com > http://twitter.com/GnipWebb > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Ubuntu repository
Hi Jon, Great suggestion. This is on a long(ish) list of packaging and build improvements we are working to put in place. We are aiming to have this done before the next major release. Thanks and keep the suggestions coming. Mark On Wed, Jun 1, 2011 at 10:37 PM, Jonathan Langevin wrote: > Any chance of Basho making an Ubuntu Riak repository available anytime soon? > A ppa via launchpad seems pretty painless to put together. > > - Jon Langevin -- sent from my Android phone > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Riak Recap for June 6 - 7
Afternoon, Evening, Morning to All - For today's Recap: blog posts, some chatter from #riak, slides, code, and more. Enjoy - Mark Community Manager Basho Technologies wiki.basho.com twitter.com/pharkmillups -- Riak Recap for June 6 - 7 === 1) jjb, Vagabond, and ericmoritz/0 had a quick chat in #riak about how well Riak would work as a session store. * Read here ---> https://gist.github.com/1015798 2) Q --- Is there a simple way to reload an erlang module for map/reduce across a cluster? (from ericmoritz\0 via #riak) A --- Assuming the module is in your code path, you can run c:nl(ModName) from the erlang console. 3) The Bay Area Basho Crew moved into a new space in San Francisco, and you should all come visit. * Read all about it here ---> http://blog.basho.com/2011/06/07/BashoWest-is-all-new-and-we-have-desks-for-you/ 4) Rick Olson (@technoweenie) posted a whole host of materials related to the talk he gave last night at the Riak Meetup in San Francisco. * Short blog post about the Dropbox clone he built with Riak and ZeroMQ ---> http://techno-weenie.net/2011/6/7/dropbear/ * PDF of his slides ---> http://dl.dropbox.com/u/3561619/talks/zeromq-riak-technoweenie.pdf * Code --->https://gist.github.com/122849a52c5b33c5d890 (Also, Rick really likes ZeroMQ. You should ask him about it if you're curious.) 5) Last reminder about Grant Schofield's Riak talk he is giving tomorrow night at the Dallas Big Data Group. * Details here ---> http://www.dfwbigdata.org/events/17758631/ ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com