Re: Getting multiple values: is iterating or MapReduce preferred?

Matt Black Mon, 25 Mar 2013 17:43:11 -0700

This is very interesting.

I've just been testing a multi-phase map reduce job which was intended to
replace some code where we look up several objects sequentially after a
small map-reduce job. Sounds like the current application code is the right
way to do it..



On 26 March 2013 11:14, John Caprice <jcapr...@basho.com> wrote:

> Rob,
>
> Performing GET requests either serially or concurrently is more efficient
> than using MapReduce to query for values.  MapReduce has additional
> overhead that GET requests do not have.  One example of this is that a GET
> is sent to only the nodes in the prefs list for a given key, while a
> MapReduce query is sent to all nodes.
>
> There are appropriate uses of MapReduce.  Using MapReduce in a controlled
> manner outside of your peak production hours can minimize performance
> effects.  For example, using MapReduce nightly to
> perform maintenance, build reports etc.  It is important to ensure that
> MapReduce queries remain bounded.  Replacing serial / concurrent GETs in
> your application with MapReduce queries provides the opportunity for
> unbounded use, which can have severe performance consequences.
>
> Making separate requests, either serially or concurrently, is the optimal
> way to query data in Riak.  To an application developer, this might not
> look as elegant however it is much more efficient for Riak.
>
> Thanks,
>
> John
>
>
> On Mon, Mar 25, 2013 at 3:07 PM, Rob Speer <r...@luminoso.com> wrote:
>
>> I've looked at the archives of this mailing list to find a way to
>> implement a "multi-get" using Riak, for the very common case where there
>> are multiple keys to look up. Making a separate round-trip to the server
>> for each key seems inefficient, after all.
>>
>> I came across the suggestion to use MapReduce, so I tried implementing it
>> this way (using riak-python-client):
>>
>>     def multi_get(self, bucket_name, ids):
>>         if len(ids) == 0:
>>             return []
>>         mr = RiakMapReduce(self.riak)
>>         for uid in ids:
>>             mr.add(bucket_name, uid)
>>         query = mr.map_values_json()
>>         return query.run()
>>
>> After this I noticed significant load on the Riak servers, and the client
>> code would sometimes stall for a long time, even on a multi_get that was
>> only returning 6 documents. Is this actually an inappropriate use of
>> MapReduce? (And are there appropriate uses of MapReduce in NoSQL databases
>> besides stress-testing them?)
>>
>> Is it better to make a separate request for each ID, to use MapReduce, or
>> to use some other method I haven't thought of?
>> -- Rob
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Getting multiple values: is iterating or MapReduce preferred?

Reply via email to