On 7/20/10 6:00 PM, Eric Filson wrote:
On Tue, Jul 20, 2010 at 3:02 PM, Justin Sheehy <jus...@basho.com
<mailto:jus...@basho.com>> wrote:
Hi, Eric! Thanks for your thoughts.
On Tue, Jul 20, 2010 at 12:39 PM, Eric Filson <efil...@gmail.com
<mailto:efil...@gmail.com>> wrote:
> I would think that this requirement,
> retrieving all objects in a bucket, to be a _very_ common
> place occurrence for modern web development and perhaps
(depending on
> requirements) _the_ most common function aside from retrieving a
single k/v
> pair.
I tend to see people that mostly try to write applications that don't
select everything from a whole bucket/table/whatever as a very
frequent occurrence, but different people have different requirements.
Certainly, it is sometimes unavoidable.
Indeed, in my case it is :(
I've had two use cases that bumped into this limitation. In one, we are
just working around / accepting the limitation. In the other, we found
it much easier/safer to consider a different solution entirely.
> I might recommend a hybrid
> solution (based in my limited knowledge of Riak)... What about
allowing a
> bucket property named something like "key_index" that points to
a key
> containing a value of "keys in bucket". Then, when calling GET
> /riak/bucket, Riak would use the key_index to immediately reduce
its result
> set before applying m/r funcs. While I understand this is
essentially what
> a developer would do, it would certainly alleviate some code
requirements
> (application side) as well as make the behavior of retrieving a
bucket's
> contents more "expected" and efficient.
A much earlier incarnation of Riak actually stored bucket keylists
explicitly in a fashion somewhat like what you describe. We removed
this as one of our biggest goals is predictable and understandable
behavior in a distributed systems sense, and a model like this one
turns each write operation into at least two operations. This isn't
just a performance issue, but also adds complexity. For instance, it
is not immediately obvious what should be returned to the client if a
data item write succeeds, but the read/write of the index fails?
Haha, these are the exact reasons I would cite as a developer for
using a similar method on Riak's side... without the option of auto
bucket indexing it effectively places this double write into the
application side where it requires more cycles and more data across
the wire. Instead of doing a single write, from the application side,
and allowing Riak to handle this, you have to GET index_key, UPDATE
index_key, ADD new_key... So rather than having a single transaction
with Riak, you have to have three transactions with Riak + Application
functionality. Inherently, this adds another level of complexity into
the application code base for something that could be done more
efficiently by the DB engine itself.
I would think a separate error number and message would suffice as a
return error, obviously though, this would require developers being
made aware so they can code for the exception.
Also, this would be optional, if the index_key wasn't set for the
bucket then this setup wouldn't be used. This would at least make the
system more flexible to the application requirements and developer
preferences.
I understand that there may be people using Riak who either never intend
to have a huge number of keys in the cluster, or who never intend to try
to map reduce over a bucket if they do.
I also understand that there are performance and complexity wins to be
had by eliminating the feature.
That said, I feel it needs to be an optional feature that the engine
itself provides. Pushing it out to the client layer severely
complicates the transaction because it is now two separate REST calls
rather than something that can be done in a tightly coupled fashion on
the node servicing the request.
-Daniel
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com