Re: Getting all the Keys

Eric Moritz Sun, 23 Jan 2011 10:02:01 -0800

This is the best way for me to understand how to model data in Riak.  Think
about the web. You always have a starting point. The starting point is an
URL. An URL is analogous to a key in Riak. A URL gets you a document on the
web, a key gets you a document in Riak.

Now on the web page addressed by your URL has other URLs that serve as
pointers to other pages.

In Riak, your starting doc(s) need to have references to other documents to
related them.

In fact both systems, when defined this way are extremely relational. They
diverge from relational databases in the fact that there's tons of redundant
data and no built-in integrity checking.

On the web this results in 404s and inconsistent titles in a tags.  The same
problems happen in Riak; deleted keys could still be referenced by other
documents.  It's all reminiscent of dangling pointers in my CS classes.
Just in that old C code I had to write, there was a lot of house keeping to
make sure integrity was preserved.

Unlike the web, with Riak, we have complete control over what links to
what.  Unfortunately it adds complexity to applications that otherwise would
be simple in an ACID DB. The benefit to this added complexity is AP.

The folks who wrote the Dynamo paper state that, at least for them, this
added complexity when negligible because they were already designing their
services to compensate for integrity issues. Unfortunately for most of us,
our SQL databases let us ignore those issues.

tl;dr think of Riak like the web. The web interrelates pages using URLs, we
have design our app's Riak docs similarly using key references.
On Jan 22, 2011 8:22 PM, "Sean Cribbs" <s...@basho.com> wrote:
> On Jan 22, 2011, at 4:15 PM, Thomas Burdick wrote:
>
>> * Why is key listing so slow?
>
> It is slow because, even if the keys are in RAM, you have to scan roughly
all of the keys in the cluster to get a listing for a single bucket. As a
certain person is fond of saying, "full table scan is full table scan".
There are ways to improve this, but without single-arbiters of state (and
points of failure) it is very costly.
>
>> * What do people do in the context of purely using riak to do what I
want, have a big set of keys to iterate over?
>
> As others have said so eloquently, they don't, they use something else. Or
they try to minimize how frequently they do it. Part of the current
revolution in data storage is about realizing that no one tool is going to
completely fit your needs, and that that's good and right. Anyone who tells
you otherwise is selling you a bill of goods.
>
> To understand why listing keys is difficult, you have to understand Riak's
(and Dynamo's) original design motivations:
>
> * To be basically available at all times for reads and writes, which in
turn means to be tolerant of machine and network failures.
> * To provide low-latency random access to large data sets. (Note I didn't
say an entire data set.)
> * To scale linearly with minimal operational complexity.
>
> Everything has tradeoffs - these are the ones we chose with Riak. Now, we
(Basho) are actively trying to create ways to make discovering your data
easier (key-filters are one of them, as Justin mentioned we're discussing
counters and indices), but the majority of people who use Riak have ways of
discovering or knowing keys ahead of time. If that's not your case, you
should look into other solutions; some good ones have been mentioned in this
thread. That said, we hear your pain and are working hard to improve
usability while maintaining the properties discussed above.
>
> Cheers,
>
> Sean Cribbs <s...@basho.com>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Getting all the Keys

Reply via email to