*Potatoes*: First a hello to the list ;) *Meat*: I recently became interested in nosql solutions and so my following statements may be out of ignorance to this new type of db schema design however, I thought it was worth mentioning...
To preface, I'm looking at nosql solutions to solve the "Big Data" problem for a limited data set, rather than using riak exclusively for storage. My proposed schema for riak consists of a bucket per collection, per user. The current behavior of riak, when retrieving the contents of any given bucket, requires all objects to be examined to determine their bucket and effectively m/r'ed down to your result set. This seems to me to be quite a costly operation and the logical choice is to store a separate k/v pair that contains an index of keys in a bucket. I would think that this requirement, retrieving all objects in a bucket, to be a _very_ common place occurrence for modern web development and perhaps (depending on requirements) _the_ most common function aside from retrieving a single k/v pair. In my mind, this seems to leave the only advantage to buckets in this application to be namespacing... While certainly important, I'm fuzzy on what the downside would be to allowing buckets to exist as a separate partition/pseudo-table/etc... so that retrieving all objects in a bucket would not need to read all objects in the entire system; especially considering how common the usage is... It also seems to me that this would closer mimic a real world "bucket" and expected vs actual behavior due to terminology would be much closer. If I'm ever examining a bucket, I'm looking at that one bucket and never all objects to see what bucket they're in. I wouldn't use the term "bucket" as the functionality currently stands because the keys aren't in buckets at all, they're all global. For all intensive purposes bucket == namespace in riak while bucket implies (to me) something more than just a namespace. This idea/concept may stem from my limited knowledge of nosql storage engines but I do feel it has some merit. Especially when trying to garner support from the development community. In lieu of changing Riak to fit this proposed model and because querying for the contents of a single bucket is so common, I might recommend a hybrid solution (based in my limited knowledge of Riak)... What about allowing a bucket property named something like "key_index" that points to a key containing a value of "keys in bucket". Then, when calling GET /riak/bucket, Riak would use the key_index to immediately reduce its result set before applying m/r funcs. While I understand this is essentially what a developer would do, it would certainly alleviate some code requirements (application side) as well as make the behavior of retrieving a bucket's contents more "expected" and efficient. Anyway, information is pretty limited on riak right now, seeing as how it's so new, but talk in my development circles is very positive and lively. I thought this might be the best place to pose my question / suggestion and get some feedback. -Eric
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com