This is very interesting. I've just been testing a multi-phase map reduce job which was intended to replace some code where we look up several objects sequentially after a small map-reduce job. Sounds like the current application code is the right way to do it..
On 26 March 2013 11:14, John Caprice <jcapr...@basho.com> wrote: > Rob, > > Performing GET requests either serially or concurrently is more efficient > than using MapReduce to query for values. MapReduce has additional > overhead that GET requests do not have. One example of this is that a GET > is sent to only the nodes in the prefs list for a given key, while a > MapReduce query is sent to all nodes. > > There are appropriate uses of MapReduce. Using MapReduce in a controlled > manner outside of your peak production hours can minimize performance > effects. For example, using MapReduce nightly to > perform maintenance, build reports etc. It is important to ensure that > MapReduce queries remain bounded. Replacing serial / concurrent GETs in > your application with MapReduce queries provides the opportunity for > unbounded use, which can have severe performance consequences. > > Making separate requests, either serially or concurrently, is the optimal > way to query data in Riak. To an application developer, this might not > look as elegant however it is much more efficient for Riak. > > Thanks, > > John > > > On Mon, Mar 25, 2013 at 3:07 PM, Rob Speer <r...@luminoso.com> wrote: > >> I've looked at the archives of this mailing list to find a way to >> implement a "multi-get" using Riak, for the very common case where there >> are multiple keys to look up. Making a separate round-trip to the server >> for each key seems inefficient, after all. >> >> I came across the suggestion to use MapReduce, so I tried implementing it >> this way (using riak-python-client): >> >> def multi_get(self, bucket_name, ids): >> if len(ids) == 0: >> return [] >> mr = RiakMapReduce(self.riak) >> for uid in ids: >> mr.add(bucket_name, uid) >> query = mr.map_values_json() >> return query.run() >> >> After this I noticed significant load on the Riak servers, and the client >> code would sometimes stall for a long time, even on a multi_get that was >> only returning 6 documents. Is this actually an inappropriate use of >> MapReduce? (And are there appropriate uses of MapReduce in NoSQL databases >> besides stress-testing them?) >> >> Is it better to make a separate request for each ID, to use MapReduce, or >> to use some other method I haven't thought of? >> -- Rob >> >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com