You're forgetting how awesome riak actually is. Given how riak is implemented, my patches should work without any operational headaches at all. Let me explain.
First, there was the one issue from yesterday. My initial patch didn't reuse the same partition bitcask on the same node. I've fixed that in a newer commit: https://github.com/jtuple/riak_kv/commit/de6b83a4fb53c25b1013f31b8c4172cc40de73ed Now, about how this all works in operation. Let's consider a simple scenario under normal riak. The key concept here is to realize that riak's vnodes are completely independent, and that failure and partition ownership changes are handled through handoff alone. Let's say we have an 8-partition ring with 3 riak nodes: n1 owns partitions 1,4,7 n2 owns partitions 2,5,8 n3 owns partitions 3,6 ie: Ring = (0/n1, 1/n2, 2/n3, 3/n1, 4/n2, 5/n3, 6/n1, 7/n2, 8/n3) Each node runs an independent vnode for each partition it owns, and each vnode will setup it's own bitcask: vnode 0/1: {n1-root}/data/bitcask/1 vnode 0/4: {n1-root}/data/bitcask/4 ... vnode 2/2: {n2-root}/data/bitcask/2 ... vnode 3/6: {n3-root}/data/bitcask/6 Reads/writes are routed to the appropriate vnodes and to the appropriate bitcasks. Under failure, hinted handoff comes into play. Let's have a write to preflist [1,2,3] while n2 is down/split. Since n2 is down, riak will send the write meant for partition 2 to another node, let's say n3. n3 will spawn a new vnode for partition 2 which is initially empty: vnode 3/2: {n3-root}/data/bitcask/2 and, write the incoming write to the new bitcask. Later, when n2 rejoins, n3 will eventually engage in handoff, and send all (k,v) in its data/bitcask/2 to n2, which writes them into its data/bitcask/2. After handing off data, n3 will shutdown it's 3/2 vnode and delete the bitcask directory {n3-root}/data/bitcask/2. Under node rebalancing / ownership changes, a similar event occurs. For example, if a new node n4 takes ownership of partition 4, then n1 will handoff it's data to n4 and then shutdown its vnode and delete its {n1-root}/data/bitcask/4. If you take the above scenario, and change all the directories of the form: {NODE-root}/data/bitcask/P to: /mnt/DISK-N/NODE/bitcask/P and allow DISK-N to be any randomly chosen directory in /mnt, then the scenario plays out exactly the same provided that riak always selects the same DISK-N for a given P on a given node (across nodes doesn't matter, vnodes are independent). My new commit handles this. A simple configuration could be: n1-vars.config: {bitcask_data_root, {random, ["/mnt/bitcask/disk1/n1", "/mnt/bitcask/disk2/n1", "/mnt/bitcask/disk3/n1"]}} n2-vars.config: {bitcask_data_root, {random, ["/mnt/bitcask/disk1/n2", "/mnt/bitcask/disk2/n2", "/mnt/bitcask/disk3/n2"]}} (...etc...) There is no inherent need for symlinks, or needing to pre-create any initial links per partition index. riak already creates and deletes partition bitcask directories on demand. If a disk fails, then all vnodes with bitcasks on that disk fail in the same manner as a disk failure under normal riak. Standard read repair, handoff, and node replacement apply. -Joe On Tue, Mar 22, 2011 at 9:53 AM, Alexander Sicular <sicul...@gmail.com> wrote: > Ya, my original message just highlighted the standard 0,1,5 that most > people/hardware should know/be able to support. There are better options and > 10 would be one of them. > > > @siculars on twitter > http://siculars.posterous.com > Sent from my iPhone > On Mar 22, 2011, at 8:43, Ryan Zezeski <rzeze...@gmail.com> wrote: > > > > On Tue, Mar 22, 2011 at 10:01 AM, Alexander Sicular <sicul...@gmail.com> > wrote: >> >> Save your ops dudes the headache and just use raid 5 and be done with it. >> > > Depending on the number of disks available I might even argue running > software RAID 10 for better throughput and less chance of data loss (as long > as you can afford to cut your avail storage in half on every machine). It's > not too hard to setup on modern Linux distros (mdadm); at least I was doing > it 5 years ago and I'm no sys admin. > -Ryan _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com