Re: Expected vs Actual Bucket Behavior
I think that we are all (myself included) getting two different issues a bit mixed up in this discussion: 1: storing an implicit index of keys in the Riak key/value store 2: making buckets separate in that a per-bucket operation's performance would not be affected by the content of other buckets The thread started out with a request for #2, but included a suggestion to do #1. These are actually two different topics. The first issue, implicitly storing a big index of keys, is impractical in a distributed key/value storage system that has Riak's availability goals. We are very unlikely to implement this as described in the near future. However, we very much recognize that there are many different ways that people would like to find their data. In that light, we are working on multiple different efforts that will use the Riak core to provide data storage with more than just "simple" key/value access. The second issue, of isolating buckets, is a much simpler design choice and is also a per-backend implementation detail. We can create and provide an alternative bitcask adapter that does this. It will be a real tradeoff: in exchange for buckets not impacting each other as much, the system will consume more filehandles, be a bit less efficient at rebalancing, and will generally make buckets no longer "free". This is a reasonable tradeoff in either direction for various applications, and I support making it available as a choice. I have created a bugzilla entry to track it: https://issues.basho.com/show_bug.cgi?id=480 I hope that this helps to clarify the issue. -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Expected vs Actual Bucket Behavior
Justin, I think we could address both 1) and 2) in another way. The "real world" need seems to be restricting the scope of costly operations like walking a huge list of keys. Either having distinct buckets or reliable lists of keys could solve the problem. But simply looking up the (Dynamo) heritage for Riak shows that this is a solved problem in other platforms. For instance Amazon's S3 service does actual separation of keys per bucket (which is what improving Bitcask in the way you described will achieve) but also has a more subtle, and complementary solution: prefixes. When you search inside a bucket you can specify a key prefix that will serve as a filter. While I have no idea if their internal implementation of prefixes is efficient, I see no reason why we can't have the same functionality in Riak and guarantee it is efficient even in distributed cases. If keys are limited to some manageable length (say 255 char strings) they could be indexed into a distributed (only eventually consistent of course) ordered list or btree that could be efficiently queried for ranges. Then to the application writer it's just a matter of providing sane keys that will allows for scoping searchs in several ways (e.g. per year, per year+month, per lot, etc.) The suggestion above would not solve all indexing needs but would surely allow for the current ones to be refactored into cases that would be efficiently solvable. -jd 2010/7/21 Justin Sheehy > I think that we are all (myself included) getting two different issues > a bit mixed up in this discussion: > > 1: storing an implicit index of keys in the Riak key/value store > > 2: making buckets separate in that a per-bucket operation's > performance would not be affected by the content of other buckets > > The thread started out with a request for #2, but included a > suggestion to do #1. These are actually two different topics. > > The first issue, implicitly storing a big index of keys, is > impractical in a distributed key/value storage system that has Riak's > availability goals. We are very unlikely to implement this as > described in the near future. However, we very much recognize that > there are many different ways that people would like to find their > data. In that light, we are working on multiple different efforts > that will use the Riak core to provide data storage with more than > just "simple" key/value access. > > The second issue, of isolating buckets, is a much simpler design > choice and is also a per-backend implementation detail. We can create > and provide an alternative bitcask adapter that does this. It will be > a real tradeoff: in exchange for buckets not impacting each other as > much, the system will consume more filehandles, be a bit less > efficient at rebalancing, and will generally make buckets no longer > "free". This is a reasonable tradeoff in either direction for various > applications, and I support making it available as a choice. I have > created a bugzilla entry to track it: > https://issues.basho.com/show_bug.cgi?id=480 > > I hope that this helps to clarify the issue. > > -Justin > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Expected vs Actual Bucket Behavior
Hi Justin, Your comment for issue 480 reads: Implement a separate bitcask backend (riak_kv_bitcask_bucket_backend?) that uses a separate bitcask per-bucket per-partition. What is a partition here? A vnode or a physical host or something else? I appreciate the difficulty of item 1. And would second your solution to item 2. Item 2 is a big concern for those of use looking to take advantage of riak as a primary datastore for realtime or near realtime systems. Perhaps piggy backing on some of the multibackend code you have already put in place. Perhaps having multiple bitcask instances per bucket per partition could also help with memory management in that if one bucket were used primarily as a WOL all the keys wouldnt need to be memory resident all the time. Thank you, Alexander On Jul 21, 2010, at 9:31 AM, Justin Sheehy wrote: > I think that we are all (myself included) getting two different issues > a bit mixed up in this discussion: > > 1: storing an implicit index of keys in the Riak key/value store > > 2: making buckets separate in that a per-bucket operation's > performance would not be affected by the content of other buckets > > The thread started out with a request for #2, but included a > suggestion to do #1. These are actually two different topics. > > The first issue, implicitly storing a big index of keys, is > impractical in a distributed key/value storage system that has Riak's > availability goals. We are very unlikely to implement this as > described in the near future. However, we very much recognize that > there are many different ways that people would like to find their > data. In that light, we are working on multiple different efforts > that will use the Riak core to provide data storage with more than > just "simple" key/value access. > > The second issue, of isolating buckets, is a much simpler design > choice and is also a per-backend implementation detail. We can create > and provide an alternative bitcask adapter that does this. It will be > a real tradeoff: in exchange for buckets not impacting each other as > much, the system will consume more filehandles, be a bit less > efficient at rebalancing, and will generally make buckets no longer > "free". This is a reasonable tradeoff in either direction for various > applications, and I support making it available as a choice. I have > created a bugzilla entry to track it: > https://issues.basho.com/show_bug.cgi?id=480 > > I hope that this helps to clarify the issue. > > -Justin > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Expected vs Actual Bucket Behavior
Hi, Alexander. On Wed, Jul 21, 2010 at 1:36 PM, Alexander Sicular wrote: > uses a separate bitcask per-bucket per-partition. What is a partition here? A > vnode or a physical host or something else? My apologies. Given that it was in our bugzilla I let myself use some Riak-internals jargon without explanation. In this context, a partition is a logical segment of the ring space, managed by a vnode process on a given physical host. There is a 1-to-1 mapping between a vnode process and a partition. The idea is that right now the bitcask backend stores all data in a given partition together in a single bitcask instance. The alternative backend under discussion would break that up, such that within a partition (and thus in each vnode), there would be a bitcask instance for every bucket that had any data. Does that help to clarify? -Justin ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Best way to back-up riak
I'm curious about this as well. Say I have a ten node cluster. Could I just schedule a midnight copy of each bitcask data directory every night, then restore to another ten node cluster by dropping one of each data directories on each new node? How close does the timing needs to be? What if the data directory snapshots were taken seconds or minutes apart? -Alan On Mon, Jul 12, 2010 at 9:56 AM, Jason J. W. Williams < jasonjwwilli...@gmail.com> wrote: > Hey Justin, > > Since Riak is lockless, what is the best approach to pulling a > distributed FS snapshot of the bitcask files across nodes? I assume if > they're not close to each other, you'll have an issue if you have to > restore a cluster. > > > -J > > On Sun, Jul 11, 2010 at 8:14 PM, Justin Sheehy wrote: > > Hi, Jan. > > > > On Sun, Jul 11, 2010 at 8:53 PM, Jan Vincent > wrote: > > > >> Given that riak is new in the database field, if ever I use riak in > production, > >> what would be the best way to back it up? I know that there's redundancy > >> on the different nodes and NRW may be modifiable per request, but I'm > >> wondering if there's a way to snapshot the dataset periodically -- at > least > >> until riak becomes provably battle tested. > > > > Riak is fairly battle-tested already: we were using its prior version > > under Basho's own customer-facing applications in 2008, and a number > > of external customers and users are in production today. That said, > > even a solid distributed database needs to be backed up as there are > > many reasons to have backups. > > > > The easiest and best way to back up Riak is, if you are using bitcask > > (the default) as the backend, to simply back up the filesystem of your > > nodes with whatever backup system you use for the rest of your > > systems. Bitcask uses append-only files, and once it closes a file it > > will never change the content of that file again. This makes it very > > backup-friendly. > > > > If you are using a backend with less backup-friendly disk format (such > > as innostore) then you can use the "riak-admin backup" command at > > either the per-node or whole-cluster level to produce a > > backend-independent snapshot that can be loaded back in via > > "riak-admin restore". This method is much slower, will impose > > additional load on your cluster when running, and requires that you > > have a place to put the generated snapshot. However, it will work > > regardless of backend and is also a simple if heavyweight way to > > migrate to a cluster with a different configuration. > > > > -Justin > > > > ___ > > riak-users mailing list > > riak-users@lists.basho.com > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Expected vs Actual Bucket Behavior
Regarding #2, I think bitcask could be modified to support an efficient list keys by bucket fairly easily, without sacrificing free buckets: The current bitcask stores record locators (key, file_id, file_offset) in memory in a big hash table by key (the bitcask key, in Riak's case, is the Riak {bucket,key} as a binary). What if the hash table were replaced with an in-memory btree? A good implementation shouldn't take more memory than a hash table, and get/put should still be very fast. The plus side is that one could then do a range traversal of the btree to get all keys in a given bucket (assuming the right comparison function for the btree). There wouldn't be any additional overhead of extra file handles, etc. because everything for a vnode would still be stored in one bitcask instance. What do you think? Curtis On Wed, Jul 21, 2010 at 6:31 AM, Justin Sheehy wrote: > I think that we are all (myself included) getting two different issues > a bit mixed up in this discussion: > > 1: storing an implicit index of keys in the Riak key/value store > > 2: making buckets separate in that a per-bucket operation's > performance would not be affected by the content of other buckets > > The thread started out with a request for #2, but included a > suggestion to do #1. These are actually two different topics. > > The first issue, implicitly storing a big index of keys, is > impractical in a distributed key/value storage system that has Riak's > availability goals. We are very unlikely to implement this as > described in the near future. However, we very much recognize that > there are many different ways that people would like to find their > data. In that light, we are working on multiple different efforts > that will use the Riak core to provide data storage with more than > just "simple" key/value access. > > The second issue, of isolating buckets, is a much simpler design > choice and is also a per-backend implementation detail. We can create > and provide an alternative bitcask adapter that does this. It will be > a real tradeoff: in exchange for buckets not impacting each other as > much, the system will consume more filehandles, be a bit less > efficient at rebalancing, and will generally make buckets no longer > "free". This is a reasonable tradeoff in either direction for various > applications, and I support making it available as a choice. I have > created a bugzilla entry to track it: > https://issues.basho.com/show_bug.cgi?id=480 > > I hope that this helps to clarify the issue. > > -Justin > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Riak Recap for 7/19 - 7/20
Afternoon, Evening, Morning to all, Greetings yet again from OSCON. For today's Recap: A few Riak-related jobs, capacity planning, Riak in Oregon and much more. Enjoy, Mark Community Manager Basho Technologies wiki.basho.com twitter.com/pharkmillups - Riak Recap for 7/19 - 7/20 1) A few Riak-related job listings appeared over the last few days: For anyone in the NYC Area ---> http://newyork.craigslist.org/mnh/sof/1850980901.html For anyone in the San Francisco Area ---> http://bit.ly/5Dqj2G 2) seancribbs and copius had a great conversation about capacity planning and partition behavior in Riak. Go read it. Gist here ---> http://gist.github.com/484979 3) @roidrage updated the Riak Homebrew recipe to 0.12. "brew install riak" That's it. Thanks, Mathias! Or you can check it out on github here ---> http://github.com/mxcl/homebrew/commit/2832c60366692a7ca4698dd5d72e6d0ff55a7b82 4) Central Oregon Web Professionals Usergroup (COWPU) is having a Riak talk taking place Wed, 7/27, at 5:30PM. It's happening at the G5 offices in Bend, Oregon. If you are in or around Bend, definitely check this out. Details here ---> http://www.cowpu.com/july-meeting-riak 5) @roder updated his Riakaws repo on Github to the new 0.12 changes. Thanks, Matt! Repo here ---> http://github.com/roder/riakaws 6) For those of you in California (and specifically the Bay Area), we announced the formation of Riak Meetup in San Francisco. Details here ---> http://blog.basho.com/2010/07/19/basho-west-and-the-riak-one-year-anniversary/ 7) Today is the last day to register for the Riak Map-Reduce Webinar. Be there, or you'll hurt @reverri's feelings. Details here ---> http://ow.ly/2exP9 ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Expected vs Actual Bucket Behavior
Hmm...just created a new account to track this in the Basho bugzilla...seems not to recognize new accounts... -J On Wed, Jul 21, 2010 at 7:31 AM, Justin Sheehy wrote: > I think that we are all (myself included) getting two different issues > a bit mixed up in this discussion: > > 1: storing an implicit index of keys in the Riak key/value store > > 2: making buckets separate in that a per-bucket operation's > performance would not be affected by the content of other buckets > > The thread started out with a request for #2, but included a > suggestion to do #1. These are actually two different topics. > > The first issue, implicitly storing a big index of keys, is > impractical in a distributed key/value storage system that has Riak's > availability goals. We are very unlikely to implement this as > described in the near future. However, we very much recognize that > there are many different ways that people would like to find their > data. In that light, we are working on multiple different efforts > that will use the Riak core to provide data storage with more than > just "simple" key/value access. > > The second issue, of isolating buckets, is a much simpler design > choice and is also a per-backend implementation detail. We can create > and provide an alternative bitcask adapter that does this. It will be > a real tradeoff: in exchange for buckets not impacting each other as > much, the system will consume more filehandles, be a bit less > efficient at rebalancing, and will generally make buckets no longer > "free". This is a reasonable tradeoff in either direction for various > applications, and I support making it available as a choice. I have > created a bugzilla entry to track it: > https://issues.basho.com/show_bug.cgi?id=480 > > I hope that this helps to clarify the issue. > > -Justin > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com