Hi Elias, That seems to be the ideal solution. If we calculate the intersection of index keys, there is no need to apply the filters again in the Map Phase. You can just return the data. Right?
Thanks for sharing your suggestion. I updated the sources to reflect this. -- Regards, Sreejith K On Tue, Feb 28, 2012 at 10:56 PM, Elias Levy <fearsome.lucid...@gmail.com>wrote: > On Mon, Feb 27, 2012 at 10:35 PM, <riak-users-requ...@lists.basho.com>wrote: > >> From: Sreejith K <sreejith...@gmail.com> >> Subject: Re: Multiple Index Queries using Riak and Python >> >> >> I find this solution extremely useful in our PaaS solution where we needed >> to support APIs similar to Google App Engine. Performance is >> largely dependent on the number of key inputs to the MapReduce phase. But >> it is quite fast when you want to get a few number of records (~1000) from >> a large data set (in millions) using filters. But when the MapReduce phase >> needs to fetch a large number of items, it is a little slow as supposed to >> be ;-). I'll surely share our experience as we go further. >> > > Looking at the code, it seems you are taking the results of each index > query and using those as the inputs of the MR job, which then performs > filtering. That seems inefficient, particularly if each index query > results in a very large result set but their intersection does not, as Riak > will have to fetch each object form disk, parse them, then filter them. > > Why not simply use compute the intersection of all the index queries in > the client and use that as the input to the MR phase? > > Elias > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com