Asking for top 100K docs will certainly consume more RAM than asking for top 2, but much less than 1 GB.
More like maybe an added ~2-3 MB or so. Mike On Wed, Jun 10, 2009 at 1:30 PM, Zhang, Lisheng<lisheng.zh...@broadvision.com> wrote: > Hi, > > Does this issue has anything to do with the line: > >> TopScoreDocCollector collector = new TopScoreDocCollector(100000); > > if we do: > >> TopScoreDocCollector collector = new TopScoreDocCollector(2); > > instead (only see top two documents), could memory usage be less? > > Best regards, Lisheng > > -----Original Message----- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Wednesday, June 10, 2009 5:40 AM > To: java-user@lucene.apache.org > Subject: Re: Lucene memory usage > > > This (very large number of unique terms) is a problem for Lucene currently. > > There are some simple improvements we could make to the terms dict > format to not require so much RAM per term in the terms index... > LUCENE-1458 (flexible indexing) has these improvements, but > unfortunately tied in w/ lots of other changes. Maybe we should break > out a separate issue for this... this'd be a great contained > improvement, if anyone out there has "the itch" :) > > One simple workaround is to call IndexReader.setTermIndexInterval > immediately after opening the reader; this simply loads fewer terms in > the index, using far less RAM, but at the expense of somewhat slower > searching. > > Also: you should peek at your index, eg using Luke, to understand why > you have so many terms. It could be legitimate (indexing a massive > catalog with eg part numbers), or, it could be your document filtering > / analyzer are accidentally producing garbage terms. > > Mike > > On Wed, Jun 10, 2009 at 8:23 AM, Benedikt Boss<na...@web.de> wrote: >> Hej hej, >> >> i have a question regarding lucenes memory usage >> when launching a query. When i execute my query >> lucene eats up over 1gig of heap-memory even >> when my result-set is only a single hit. I >> found out that this is due to the "ensureIndexIsRead()" >> method-call in the "TermInfosReader" class, which >> iterates over all Terms found in the index and saves >> them (including all value-strings) in a Term-Array. >> Is it possible to not read all that stuff >> into memory at all? >> >> Im doing the query like in the following pseudo-code: >> ------------------------------------------------------------------------ >> >> TopScoreDocCollector collector = new TopScoreDocCollector(100000); >> >> QueryParser parser= new QueryParser(field, new WhitespaceAnalyzer() ); >> Directory fsDir = new FSDirectory(indexDir, null); >> IndexSearcher is = new IndexSearcher(fsdir); >> >> Query query = parser.parse(q); >> >> is.search(query, collector); >> ScoreDoc[] hits = collector.topDocs(); >> >> ....... < iterate over hits and print results > >> >> >> Thanks in advance >> Benedikt >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org