This is the best way for me to understand how to model data in Riak. Think about the web. You always have a starting point. The starting point is an URL. An URL is analogous to a key in Riak. A URL gets you a document on the web, a key gets you a document in Riak.
Now on the web page addressed by your URL has other URLs that serve as pointers to other pages. In Riak, your starting doc(s) need to have references to other documents to related them. In fact both systems, when defined this way are extremely relational. They diverge from relational databases in the fact that there's tons of redundant data and no built-in integrity checking. On the web this results in 404s and inconsistent titles in a tags. The same problems happen in Riak; deleted keys could still be referenced by other documents. It's all reminiscent of dangling pointers in my CS classes. Just in that old C code I had to write, there was a lot of house keeping to make sure integrity was preserved. Unlike the web, with Riak, we have complete control over what links to what. Unfortunately it adds complexity to applications that otherwise would be simple in an ACID DB. The benefit to this added complexity is AP. The folks who wrote the Dynamo paper state that, at least for them, this added complexity when negligible because they were already designing their services to compensate for integrity issues. Unfortunately for most of us, our SQL databases let us ignore those issues. tl;dr think of Riak like the web. The web interrelates pages using URLs, we have design our app's Riak docs similarly using key references. On Jan 22, 2011 8:22 PM, "Sean Cribbs" <s...@basho.com> wrote: > On Jan 22, 2011, at 4:15 PM, Thomas Burdick wrote: > >> * Why is key listing so slow? > > It is slow because, even if the keys are in RAM, you have to scan roughly all of the keys in the cluster to get a listing for a single bucket. As a certain person is fond of saying, "full table scan is full table scan". There are ways to improve this, but without single-arbiters of state (and points of failure) it is very costly. > >> * What do people do in the context of purely using riak to do what I want, have a big set of keys to iterate over? > > As others have said so eloquently, they don't, they use something else. Or they try to minimize how frequently they do it. Part of the current revolution in data storage is about realizing that no one tool is going to completely fit your needs, and that that's good and right. Anyone who tells you otherwise is selling you a bill of goods. > > To understand why listing keys is difficult, you have to understand Riak's (and Dynamo's) original design motivations: > > * To be basically available at all times for reads and writes, which in turn means to be tolerant of machine and network failures. > * To provide low-latency random access to large data sets. (Note I didn't say an entire data set.) > * To scale linearly with minimal operational complexity. > > Everything has tradeoffs - these are the ones we chose with Riak. Now, we (Basho) are actively trying to create ways to make discovering your data easier (key-filters are one of them, as Justin mentioned we're discussing counters and indices), but the majority of people who use Riak have ways of discovering or knowing keys ahead of time. If that's not your case, you should look into other solutions; some good ones have been mentioned in this thread. That said, we hear your pain and are working hard to improve usability while maintaining the properties discussed above. > > Cheers, > > Sean Cribbs <s...@basho.com> > Developer Advocate > Basho Technologies, Inc. > http://basho.com/ > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com