Re: Getting multiple values: is iterating or MapReduce preferred?

John Caprice Mon, 25 Mar 2013 17:15:49 -0700

Rob,

Performing GET requests either serially or concurrently is more efficient
than using MapReduce to query for values.  MapReduce has additional
overhead that GET requests do not have.  One example of this is that a GET
is sent to only the nodes in the prefs list for a given key, while a
MapReduce query is sent to all nodes.

There are appropriate uses of MapReduce.  Using MapReduce in a controlled
manner outside of your peak production hours can minimize performance
effects.  For example, using MapReduce nightly to
perform maintenance, build reports etc.  It is important to ensure that
MapReduce queries remain bounded.  Replacing serial / concurrent GETs in
your application with MapReduce queries provides the opportunity for
unbounded use, which can have severe performance consequences.

Making separate requests, either serially or concurrently, is the optimal
way to query data in Riak.  To an application developer, this might not
look as elegant however it is much more efficient for Riak.

Thanks,

John

On Mon, Mar 25, 2013 at 3:07 PM, Rob Speer <r...@luminoso.com> wrote:

> I've looked at the archives of this mailing list to find a way to
> implement a "multi-get" using Riak, for the very common case where there
> are multiple keys to look up. Making a separate round-trip to the server
> for each key seems inefficient, after all.
>
> I came across the suggestion to use MapReduce, so I tried implementing it
> this way (using riak-python-client):
>
>     def multi_get(self, bucket_name, ids):
>         if len(ids) == 0:
>             return []
>         mr = RiakMapReduce(self.riak)
>         for uid in ids:
>             mr.add(bucket_name, uid)
>         query = mr.map_values_json()
>         return query.run()
>
> After this I noticed significant load on the Riak servers, and the client
> code would sometimes stall for a long time, even on a multi_get that was
> only returning 6 documents. Is this actually an inappropriate use of
> MapReduce? (And are there appropriate uses of MapReduce in NoSQL databases
> besides stress-testing them?)
>
> Is it better to make a separate request for each ID, to use MapReduce, or
> to use some other method I haven't thought of?
> -- Rob
>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Getting multiple values: is iterating or MapReduce preferred?

Reply via email to