Re: Expected vs Actual Bucket Behavior

2010-07-21 Thread Justin Sheehy
I think that we are all (myself included) getting two different issues
a bit mixed up in this discussion:

1: storing an implicit index of keys in the Riak key/value store

2: making buckets separate in that a per-bucket operation's
performance would not be affected by the content of other buckets

The thread started out with a request for #2, but included a
suggestion to do #1.  These are actually two different topics.

The first issue, implicitly storing a big index of keys, is
impractical in a distributed key/value storage system that has Riak's
availability goals.  We are very unlikely to implement this as
described in the near future.  However, we very much recognize that
there are many different ways that people would like to find their
data.  In that light, we are working on multiple different efforts
that will use the Riak core to provide data storage with more than
just "simple" key/value access.

The second issue, of isolating buckets, is a much simpler design
choice and is also a per-backend implementation detail.  We can create
and provide an alternative bitcask adapter that does this.  It will be
a real tradeoff: in exchange for buckets not impacting each other as
much, the system will consume more filehandles, be a bit less
efficient at rebalancing, and will generally make buckets no longer
"free".  This is a reasonable tradeoff in either direction for various
applications, and I support making it available as a choice.  I have
created a bugzilla entry to track it:
https://issues.basho.com/show_bug.cgi?id=480

I hope that this helps to clarify the issue.

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Expected vs Actual Bucket Behavior

2010-07-21 Thread John D. Rowell
Justin, I think we could address both 1) and 2) in another way. The "real
world" need seems to be restricting the scope of costly operations like
walking a huge list of keys. Either having distinct buckets or reliable
lists of keys could solve the problem.

But simply looking up the (Dynamo) heritage for Riak shows that this is a
solved problem in other platforms. For instance Amazon's S3 service does
actual separation of keys per bucket (which is what improving Bitcask in the
way you described will achieve) but also has a more subtle, and
complementary solution: prefixes. When you search inside a bucket you can
specify a key prefix that will serve as a filter.

While I have no idea if their internal implementation of prefixes is
efficient, I see no reason why we can't have the same functionality in Riak
and guarantee it is efficient even in distributed cases. If keys are limited
to some manageable length (say 255 char strings) they could be indexed into
a distributed (only eventually consistent of course) ordered list or btree
that could be efficiently queried for ranges. Then to the application writer
it's just a matter of providing sane keys that will allows for scoping
searchs in several ways (e.g. per year, per year+month, per lot, etc.)

The suggestion above would not solve all indexing needs but would surely
allow for the current ones to be refactored into cases that would be
efficiently solvable.

-jd

2010/7/21 Justin Sheehy 

> I think that we are all (myself included) getting two different issues
> a bit mixed up in this discussion:
>
> 1: storing an implicit index of keys in the Riak key/value store
>
> 2: making buckets separate in that a per-bucket operation's
> performance would not be affected by the content of other buckets
>
> The thread started out with a request for #2, but included a
> suggestion to do #1.  These are actually two different topics.
>
> The first issue, implicitly storing a big index of keys, is
> impractical in a distributed key/value storage system that has Riak's
> availability goals.  We are very unlikely to implement this as
> described in the near future.  However, we very much recognize that
> there are many different ways that people would like to find their
> data.  In that light, we are working on multiple different efforts
> that will use the Riak core to provide data storage with more than
> just "simple" key/value access.
>
> The second issue, of isolating buckets, is a much simpler design
> choice and is also a per-backend implementation detail.  We can create
> and provide an alternative bitcask adapter that does this.  It will be
> a real tradeoff: in exchange for buckets not impacting each other as
> much, the system will consume more filehandles, be a bit less
> efficient at rebalancing, and will generally make buckets no longer
> "free".  This is a reasonable tradeoff in either direction for various
> applications, and I support making it available as a choice.  I have
> created a bugzilla entry to track it:
> https://issues.basho.com/show_bug.cgi?id=480
>
> I hope that this helps to clarify the issue.
>
> -Justin
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Expected vs Actual Bucket Behavior

2010-07-21 Thread Alexander Sicular
Hi Justin, 

Your comment for issue 480 reads: Implement a separate bitcask backend 
(riak_kv_bitcask_bucket_backend?) that
uses a separate bitcask per-bucket per-partition. What is a partition here? A 
vnode or a physical host or something else? I appreciate the difficulty of item 
1. And would second your solution to item 2. Item 2 is a big concern for those 
of use looking to take advantage of riak as a primary datastore for realtime or 
near realtime systems. Perhaps piggy backing on some of the multibackend code 
you have already put in place. Perhaps having multiple bitcask instances per 
bucket per partition could also help with memory management in that if one 
bucket were used primarily as a WOL all the keys wouldnt need to be memory 
resident all the time.

Thank you, Alexander



On Jul 21, 2010, at 9:31 AM, Justin Sheehy wrote:

> I think that we are all (myself included) getting two different issues
> a bit mixed up in this discussion:
> 
> 1: storing an implicit index of keys in the Riak key/value store
> 
> 2: making buckets separate in that a per-bucket operation's
> performance would not be affected by the content of other buckets
> 
> The thread started out with a request for #2, but included a
> suggestion to do #1.  These are actually two different topics.
> 
> The first issue, implicitly storing a big index of keys, is
> impractical in a distributed key/value storage system that has Riak's
> availability goals.  We are very unlikely to implement this as
> described in the near future.  However, we very much recognize that
> there are many different ways that people would like to find their
> data.  In that light, we are working on multiple different efforts
> that will use the Riak core to provide data storage with more than
> just "simple" key/value access.
> 
> The second issue, of isolating buckets, is a much simpler design
> choice and is also a per-backend implementation detail.  We can create
> and provide an alternative bitcask adapter that does this.  It will be
> a real tradeoff: in exchange for buckets not impacting each other as
> much, the system will consume more filehandles, be a bit less
> efficient at rebalancing, and will generally make buckets no longer
> "free".  This is a reasonable tradeoff in either direction for various
> applications, and I support making it available as a choice.  I have
> created a bugzilla entry to track it:
> https://issues.basho.com/show_bug.cgi?id=480
> 
> I hope that this helps to clarify the issue.
> 
> -Justin
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Expected vs Actual Bucket Behavior

2010-07-21 Thread Justin Sheehy
Hi, Alexander.

On Wed, Jul 21, 2010 at 1:36 PM, Alexander Sicular  wrote:

> uses a separate bitcask per-bucket per-partition. What is a partition here? A
> vnode or a physical host or something else?

My apologies.  Given that it was in our bugzilla I let myself use some
Riak-internals jargon without explanation.

In this context, a partition is a logical segment of the ring space,
managed by a vnode process on a given physical host.  There is a
1-to-1 mapping between a vnode process and a partition.

The idea is that right now the bitcask backend stores all data in a
given partition together in a single bitcask instance.  The
alternative backend under discussion would break that up, such that
within a partition (and thus in each vnode), there would be a bitcask
instance for every bucket that had any data.

Does that help to clarify?

-Justin

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Best way to back-up riak

2010-07-21 Thread Alan McConnell
I'm curious about this as well.  Say I have a ten node cluster.  Could I
just schedule a midnight copy of each bitcask data directory every night,
then restore to another ten node cluster by dropping one of each data
directories on each new node?  How close does the timing needs to be?  What
if the data directory snapshots were taken seconds or minutes apart?

-Alan


On Mon, Jul 12, 2010 at 9:56 AM, Jason J. W. Williams <
jasonjwwilli...@gmail.com> wrote:

> Hey Justin,
>
> Since Riak is lockless, what is the best approach to pulling a
> distributed FS snapshot of the bitcask files across nodes? I assume if
> they're not close to each other, you'll have an issue if you have to
> restore a cluster.
>
>
> -J
>
> On Sun, Jul 11, 2010 at 8:14 PM, Justin Sheehy  wrote:
> > Hi, Jan.
> >
> > On Sun, Jul 11, 2010 at 8:53 PM, Jan Vincent 
> wrote:
> >
> >> Given that riak is new in the database field, if ever I use riak in
> production,
> >> what would be the best way to back it up? I know that there's redundancy
> >> on the different nodes and NRW may be modifiable per request, but I'm
> >> wondering if there's a way to snapshot the dataset periodically -- at
> least
> >> until riak becomes provably battle tested.
> >
> > Riak is fairly battle-tested already: we were using its prior version
> > under Basho's own customer-facing applications in 2008, and a number
> > of external customers and users are in production today.  That said,
> > even a solid distributed database needs to be backed up as there are
> > many reasons to have backups.
> >
> > The easiest and best way to back up Riak is, if you are using bitcask
> > (the default) as the backend, to simply back up the filesystem of your
> > nodes with whatever backup system you use for the rest of your
> > systems.  Bitcask uses append-only files, and once it closes a file it
> > will never change the content of that file again.  This makes it very
> > backup-friendly.
> >
> > If you are using a backend with less backup-friendly disk format (such
> > as innostore) then you can use the "riak-admin backup" command at
> > either the per-node or whole-cluster level to produce a
> > backend-independent snapshot that can be loaded back in via
> > "riak-admin restore".  This method is much slower, will impose
> > additional load on your cluster when running, and requires that you
> > have a place to put the generated snapshot.  However, it will work
> > regardless of backend and is also a simple if heavyweight way to
> > migrate to a cluster with a different configuration.
> >
> > -Justin
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Expected vs Actual Bucket Behavior

2010-07-21 Thread Curtis Caravone
Regarding #2, I think bitcask could be modified to support an efficient list
keys by bucket fairly easily, without sacrificing free buckets:

The current bitcask stores record locators (key, file_id, file_offset) in
memory in a big hash table by key (the bitcask key, in Riak's case, is the
Riak {bucket,key} as a binary).

What if the hash table were replaced with an in-memory btree?  A good
implementation shouldn't take more memory than a hash table, and get/put
should still be very fast.  The plus side is that one could then do a range
traversal of the btree to get all keys in a given bucket (assuming the right
comparison function for the btree).  There wouldn't be any additional
overhead of extra file handles, etc. because everything for a vnode would
still be stored in one bitcask instance.

What do you think?

Curtis

On Wed, Jul 21, 2010 at 6:31 AM, Justin Sheehy  wrote:

> I think that we are all (myself included) getting two different issues
> a bit mixed up in this discussion:
>
> 1: storing an implicit index of keys in the Riak key/value store
>
> 2: making buckets separate in that a per-bucket operation's
> performance would not be affected by the content of other buckets
>
> The thread started out with a request for #2, but included a
> suggestion to do #1.  These are actually two different topics.
>
> The first issue, implicitly storing a big index of keys, is
> impractical in a distributed key/value storage system that has Riak's
> availability goals.  We are very unlikely to implement this as
> described in the near future.  However, we very much recognize that
> there are many different ways that people would like to find their
> data.  In that light, we are working on multiple different efforts
> that will use the Riak core to provide data storage with more than
> just "simple" key/value access.
>
> The second issue, of isolating buckets, is a much simpler design
> choice and is also a per-backend implementation detail.  We can create
> and provide an alternative bitcask adapter that does this.  It will be
> a real tradeoff: in exchange for buckets not impacting each other as
> much, the system will consume more filehandles, be a bit less
> efficient at rebalancing, and will generally make buckets no longer
> "free".  This is a reasonable tradeoff in either direction for various
> applications, and I support making it available as a choice.  I have
> created a bugzilla entry to track it:
> https://issues.basho.com/show_bug.cgi?id=480
>
> I hope that this helps to clarify the issue.
>
> -Justin
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Riak Recap for 7/19 - 7/20

2010-07-21 Thread Mark Phillips
Afternoon, Evening, Morning to all,

Greetings yet again from OSCON. For today's Recap: A few Riak-related
jobs, capacity planning, Riak in Oregon and much more.

Enjoy,

Mark

Community Manager
Basho Technologies
wiki.basho.com
twitter.com/pharkmillups

-

Riak Recap for 7/19 - 7/20

1) A few Riak-related job listings appeared over the last few days:

For anyone in the NYC Area --->
http://newyork.craigslist.org/mnh/sof/1850980901.html

For anyone in the San Francisco Area ---> http://bit.ly/5Dqj2G

2) seancribbs and copius had a great conversation about capacity
planning and partition behavior in Riak. Go read it.

Gist here ---> http://gist.github.com/484979

3) @roidrage updated the Riak Homebrew recipe to 0.12. "brew install
riak" That's it. Thanks, Mathias!

Or you can check it out on github here --->
http://github.com/mxcl/homebrew/commit/2832c60366692a7ca4698dd5d72e6d0ff55a7b82

4) Central Oregon Web Professionals Usergroup (COWPU) is having a Riak
talk taking place Wed,  7/27, at 5:30PM. It's happening at the G5
offices in Bend, Oregon. If you are in or around Bend, definitely
check this out.

Details here ---> http://www.cowpu.com/july-meeting-riak

5) @roder updated his Riakaws repo on Github to the new 0.12 changes.
Thanks, Matt!

Repo here --->  http://github.com/roder/riakaws

6) For those of you in California (and specifically the Bay Area), we
announced the formation of Riak Meetup in San Francisco.

Details here --->
http://blog.basho.com/2010/07/19/basho-west-and-the-riak-one-year-anniversary/

7) Today is the last day to register for the Riak Map-Reduce Webinar.
Be there, or you'll hurt @reverri's feelings.

Details here ---> http://ow.ly/2exP9

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Expected vs Actual Bucket Behavior

2010-07-21 Thread Jason J. W. Williams
Hmm...just created a new account to track this in the Basho
bugzilla...seems not to recognize new accounts...

-J

On Wed, Jul 21, 2010 at 7:31 AM, Justin Sheehy  wrote:
> I think that we are all (myself included) getting two different issues
> a bit mixed up in this discussion:
>
> 1: storing an implicit index of keys in the Riak key/value store
>
> 2: making buckets separate in that a per-bucket operation's
> performance would not be affected by the content of other buckets
>
> The thread started out with a request for #2, but included a
> suggestion to do #1.  These are actually two different topics.
>
> The first issue, implicitly storing a big index of keys, is
> impractical in a distributed key/value storage system that has Riak's
> availability goals.  We are very unlikely to implement this as
> described in the near future.  However, we very much recognize that
> there are many different ways that people would like to find their
> data.  In that light, we are working on multiple different efforts
> that will use the Riak core to provide data storage with more than
> just "simple" key/value access.
>
> The second issue, of isolating buckets, is a much simpler design
> choice and is also a per-backend implementation detail.  We can create
> and provide an alternative bitcask adapter that does this.  It will be
> a real tradeoff: in exchange for buckets not impacting each other as
> much, the system will consume more filehandles, be a bit less
> efficient at rebalancing, and will generally make buckets no longer
> "free".  This is a reasonable tradeoff in either direction for various
> applications, and I support making it available as a choice.  I have
> created a bugzilla entry to track it:
> https://issues.basho.com/show_bug.cgi?id=480
>
> I hope that this helps to clarify the issue.
>
> -Justin
>
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com