Here is the gist of the code: Query query = new TermQuery( new Term("contents", q.toLowerCase()));
long start = new Date().getTime(); Hits hits = is.search(query); long end = new Date().getTime(); System.err.println("Found " + hits.length() + " document(s) (in " + (end - start) + " milliseconds) that matched query '" + q + "'"); int ct = hits.length() ; int ct2 = 400000; int step = 10000; int startct; while (ct2 < ct ) { startct = ct2; for (int i = startct; i < startct+step; i++ ) { if (ct2 >= ct ) { break; } Document doc = hits.doc(ct2); doc.get("filename"); ct2++; } System.out.println( "ct2 is " + ct2 ); ir.close(); is.close(); fsDir.close(); ir = null; is = null; fsDir = null; fsDir = FSDirectory.getDirectory(indexDir, false); ir = IndexReader.open(fsDir); is = new IndexSearcher(ir); hits = is.search(query); } if ct2 is set to 40,000 as oppose to 400,000 , I see some output before I get the out-of-memory. If not, I get out of memory error almost instantly without any output. Is there a method call to clear the cache ? Thank you for your response. On 5/14/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
Could you share at least some pseudo-code of what you're doing in the loop of retrieving the "name" of each document? Are you storing all of those names as you iterate? Have you profiled your application to see exactly where the memory is going? It is surely being eaten by your own code and not Lucene. Erik On May 14, 2006, at 12:07 PM, Beady Geraghty wrote: > I have an out-of-memroy error when returning many hits. > > I am still on Lucene 1.4.3 > > I have a simple term query. It returned 899810 documents. > I try to retrieve the name of each document and nothing else > and I ran out of memory. > > Instead of getting the names all at once, I tried to query again after > every 10,000 document. > I close the index reader, index searcher, and the fsDir and re-query > for every 10000 documents. This still doesn't work. > >> From another entry in the forum, it appears that the information >> about > the hits that I have skipped over are still kept even though I don't > access them. Am I understanding it correctly that if I start > accessing > from the 400000th documents onwards, some information about the > 0-399999 > documents are still cached even though I have skipped over those. > Is there a way to get the file name (and perhaps other information) > of the > remaining > documents ? > > (I tried a different term query that returned a hit size of 400000, > and I > was able > to get the names of them all without re-quering) > > I think that I see someone mentioned about clearing the hit cache , > though I don't how this is done. > > Thank you in advance for any hints on dealing with this. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]