Hi Daniel,

If you use search to provide (streaming) input to map/reduce then you can do 
additional processing in the M/R phases to condition and limit your results.  
For example you can do additional filtering in a map phase if necessary, as 
well as perhaps extracting some subset of the data that is being returned if 
that is applicable.  You can add one or more reduce phases to sort and paginate 
(slice) the results.

So you will still only be returning a predetermined number of records from the 
last reduce phase.

--gordon


On Apr 14, 2011, at 12:18 , Daniel Rathbone wrote:

To be clear, I'm only talking about the solr interface.  I'm wondering if my 
query time will remain fixed (since it's capped at rows=1000) as I add several 
million docs to the index.

If I use my search as an input into Map/Reduce, won't my response time grow 
with my index? My search query would queue up a very large result set - and I 
expect performance to suffer if I trim this down in a reduce phase.

It would seem that I can prevent that slowdown by limiting the rows in the 
search (with rows=1000).  Despite that limit, though, I hit the 
too_many_results error which indicates that the search queues up a very large 
result set before it applies the row limit.  Is there something I'm missing 
here?

thanks,
Daniel


Basically, I'm wondering if my query time will remain

On Thu, Apr 14, 2011 at 7:53 AM, Gordon Tillman 
<gtill...@mezeo.com<mailto:gtill...@mezeo.com>> wrote:
Daniel the max_search_results only applies to searches done via the solr 
interface.  From 
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-January/002974.html:

- System now aborts queries that would queue up too many documents in
  a result set. This is controlled by a 'max_search_results' setting
  in riak_search. Note that this only affects the Solr
  interface. Searches through the Riak Client API that feed into a
  Map/Reduce job are still allowed to execute because the system
  streams those results.


So you can use a map-reduce operation (with the search phase providing the 
inputs) and you should be OK.

--gordon

<http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-January/002974.html>
On Apr 14, 2011, at 04:49 , Daniel Rathbone wrote:

Hi list,

I'm wondering how riaksearch performance will degrade as I add documents.

For my purpose I limit rows at 1k and sorting is not necessary.  I have a 
single node cluster for development.  I know I can increase performance if I 
add nodes but I'd like to understand this before I do.

My documents are small ~200 bytes.  With an index of 30k and rows limited to 
1k, no problems.  I added 100k documents, and then I hit the too_many_results 
error.  Since I still have my row limit set at 1k, this indicates that the 
query does not return as soon as it finds the first 1k hits.  Is there a way to 
short circuit my queries so that they don't have to scan the whole index?

I got around too_many_results by increasing my max_search_results (I read 
https://help.basho.com/entries/480664-i-get-the-error-too-many-results).  I 
wonder, though, if I'll keep bumping memory boundaries as I add a few million 
docs to my index.

Thanks,
Daniel
<ATT00001..txt>



_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to