The best practice is, well, "It Depends" (tm). First off, I wouldn't do any caching of results unless and until you had a reasonable certainty that you had performance issues, so <b> would by my first choice. And if you *did* start to see performance issues, I'd look first at why the queries were expensive rather than look at caching. And I'd be certain that you were getting a lot of requests for pages 2-N by mining my query logs. There's no point in putting a caching scheme in if only 10% of your queries were for subsequent pages. Or even 50% of the queries were for subsequent pages.
The thing to remember is that every search/sort *must* score and/or sort all the documents to catch the case that the very last document in the index is the best match. So having a method that only returned matches N through N+pagesize only saves the time/memory needed to copy matches 0 through N, and each ScoreDoc is just an int and a float. You can copy a LOT of ScoreDocs around before you notice........ What a caching scheme *would* save is re-executing the query. But long before I went to a caching scheme, I'd try to understand why my queries were slow. Especially when you couple that with the fact that the overwhelming number of users don't page very far into the result set before changing the query. Form the eXtreme Programming people "Do the simplest thing that could possibly work". I add the addendum "Then *measure* to see what the problems are before 'fixing' anything". FWIW Erick On Wed, Feb 18, 2009 at 10:29 PM, <rolaren...@earthlink.net> wrote: > R2.4 > > So, I may well be missing something here, but: I use > > <pseudoCode>IndexSearcher.search(someQuery, null, count, new > Sort());</pseudoCode> > > to get an instance of TopFieldDocs (the "Hits" is deprecated). So far, all > fine; I get a bunch of documents. Now, what is the Lucene-best-practice for > getting the *next* batch of size "count"? (Didn't see this discussed > anywhere, but maybe I missed it.) > > a) I could guess that my users will never want more than "N*count", for > some value of N, request that right up front, and do all my own "paging" > using the one TopFieldDocs instance; > > b) I could assume that (a) will be an inefficient memory and time hog, and > when the user clicks "Next" (or whatever), then ... (with i starting at "1") > get a new TopFieldDocs with "(++i)*count", and out of that discard the first > "i*count" items? In the limit (as i => N) that uses up just as much space > and memory, but does so lazily (better); > > c) some compromise of (a) and (b), where I get M*count, do my own paging, > and when the user asks for the (i+1)==(M+1)-th batch, then get another > M*count (maybe faster, but also maybe bigger amortized memory footprint); > > d) something else? (I'd hope for something like a search() method with some > parameter saying, in effect, "such and such a range of hits" ...) > > thanks, > Paul > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >