On Mon, Feb 27, 2012 at 10:35 PM, <riak-users-requ...@lists.basho.com>wrote:

> From: Sreejith K <sreejith...@gmail.com>
> Subject: Re: Multiple Index Queries using Riak and Python
>
> I find this solution extremely useful in our PaaS solution where we needed
> to support APIs similar to Google App Engine. Performance is
> largely dependent on the number of key inputs to the MapReduce phase. But
> it is quite fast when you want to get a few number of records (~1000) from
> a large data set (in millions) using filters. But when the MapReduce phase
> needs to fetch a large number of items, it is a little slow as supposed to
> be ;-). I'll surely share our experience as we go further.
>

Looking at the code, it seems you are taking the results of each index
query and using those as the inputs of the MR job, which then performs
filtering.  That seems inefficient, particularly if each index query
results in a very large result set but their intersection does not, as Riak
will have to fetch each object form disk, parse them, then filter them.

Why not simply use compute the intersection of all the index queries in the
client and use that as the input to the MR phase?

Elias
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to