OutofMemory in large index

2009-11-12 Thread Wenbo Zhao
Hi all, I got OutOfMemoryError at org.apache.lucene.search.Searcher.search(Searcher.java:183) My index is 43G bytes. Is that too big for Lucene ? Luke can see the index has over 1800M docs, but the search is also out of memory. I use -Xmx1024M to specify 1G java heap space. One abnormal thing is

Re: docBase Parameter in Collector.setNextReader

2009-11-12 Thread Michael McCandless
On Thu, Nov 12, 2009 at 5:28 PM, Uwe Schindler wrote: > Mike: What was the reason for this change? We first thought this (visiting segments from largest to smallest size) improved performance, but, then we decided a better optimization was for Collectors to save tie breaking by knowing the docID

Re: Wrapping IndexSearcher so that it is safe?

2009-11-12 Thread Michael McCandless
On Thu, Nov 12, 2009 at 5:45 PM, Jacob Rhoden wrote: >> SearcherManager can work with a near real-time reader (via >> IndexWriter.getReader), or with a standalone reader (via >> IndexReader.open), so that's another source of more complexity vs your >> use case. > > There can be quite a large numbe

Re: Verbose logging via ant, get an OOM

2009-11-12 Thread Jason Rutherglen
Thanks Uwe and Mike! On Thu, Nov 12, 2009 at 11:48 AM, Michael McCandless wrote: > Or, just run the junit test "directly", which doesn't try to buffer > the output, so you can see it "live".  Something like this: > > java -cp > .:/usr/local/src/junit-4.4.jar:./build/classes/test:./build/classes/

Re: Wrapping IndexSearcher so that it is safe?

2009-11-12 Thread Jacob Rhoden
On 13/11/2009, at 9:19 AM, Michael McCandless wrote: On Wed, Nov 11, 2009 at 7:33 PM, Jacob Rhoden > wrote: The source code for SearcherManager is even downloadable for free: http://www.manning.com/hatcher3/LIAsourcecode.zip The example source code does some things that is beyond my level o

Re: Unexpected results searching for phrase with stop words

2009-11-12 Thread Simon Wistow
On Thu, Nov 12, 2009 at 09:20:30PM +0100, Uwe Schindler said: > Which version of Lucene are you using and which Version constant do you pass > to Analyzer and Query Parser? In 2.9.0 there was a bug/incorrect setting > between the query parser and the Version.LUCENE_CURRENT / Version.LUCENE_29 > set

RE: docBase Parameter in Collector.setNextReader

2009-11-12 Thread Uwe Schindler
> By the way, the docStarts should be 5 and then 0, as IndexSearcher starts > to > search bigger segments first. Maybe this is your problem, that you have > only > looked at the second call? Oh, that's no longer the case. Sorry. The docBases should be sorted upwards. Mike: What was the reason for

Re: Wrapping IndexSearcher so that it is safe?

2009-11-12 Thread Michael McCandless
On Thu, Nov 12, 2009 at 4:44 PM, Jacob Rhoden wrote: > > On 12/11/2009, at 8:42 PM, Michael McCandless wrote: > >> On Wed, Nov 11, 2009 at 7:33 PM, Jacob Rhoden >> wrote: >>> >>> The source code for SearcherManager is even downloadable for free: >>>  http://www.manning.com/hatcher3/LIAsourcecode.

RE: docBase Parameter in Collector.setNextReader

2009-11-12 Thread Uwe Schindler
Could it that you are using the expert IndexSearcher ctor that takes the sub reader array and docStarts? Else it is impossible that all docBases are 0 (look into the code). By the way, the docStarts should be 5 and then 0, as IndexSearcher starts to search bigger segments first. Maybe this is you

Re: docBase Parameter in Collector.setNextReader

2009-11-12 Thread Michael McCandless
Yes it should be 0 and 5. I'm not sure what would cause 0 and 0, offhand. Can you make a small standalone test case showing it? Mike On Thu, Nov 12, 2009 at 4:25 PM, Benjamin Heilbrunn wrote: > Hello everyone, > > I'm a little bit confused about the docBase parameter of > Collector.setNextRead

Re: Wrapping IndexSearcher so that it is safe?

2009-11-12 Thread Jacob Rhoden
On 12/11/2009, at 8:42 PM, Michael McCandless wrote: On Wed, Nov 11, 2009 at 7:33 PM, Jacob Rhoden wrote: The source code for SearcherManager is even downloadable for free: http://www.manning.com/hatcher3/LIAsourcecode.zip The example source code does some things that is beyond my level o

docBase Parameter in Collector.setNextReader

2009-11-12 Thread Benjamin Heilbrunn
Hello everyone, I'm a little bit confused about the docBase parameter of Collector.setNextReader. Imagine the following: - Create new Index - Index 5 docs - Call IndexWriter.commit() - Index 7 docs - Call IndexWriter.commit() - close Writer Now I have a 2-segment index right? I have

RE: Unexpected results searching for phrase with stop words

2009-11-12 Thread Uwe Schindler
Which version of Lucene are you using and which Version constant do you pass to Analyzer and Query Parser? In 2.9.0 there was a bug/incorrect setting between the query parser and the Version.LUCENE_CURRENT / Version.LUCENE_29 setting. If you did not enable position increments in query parser, that

Re: Unexpected results searching for phrase with stop words

2009-11-12 Thread Erick Erickson
Yes, you're doing something wrong . What, you may ask? Well, it's kind of hard to say without knowing what analyzers you use at index AND query time and what the query you're submitting looks like... But the very first thing I'd try is to get a copy of Luke and peek at your index to see if what yo

Unexpected results searching for phrase with stop words

2009-11-12 Thread Simon Wistow
I have a document with the title "Here, there be dragons" and a body. When I search for Here, there be dragons (no quotes) with a title boost of 2.0 and a body boost of 0.8 I get the document as the first hit which is what I'd expect. However, if change the query to "Here, there be dragons"

Re: Verbose logging via ant, get an OOM

2009-11-12 Thread Michael McCandless
Or, just run the junit test "directly", which doesn't try to buffer the output, so you can see it "live". Something like this: java -cp .:/usr/local/src/junit-4.4.jar:./build/classes/test:./build/classes/java:./build/classes/demo -Dlucene.version=2.9-dev -DtempDir=build -ea org.junit.runner.JUni

RE: Verbose logging via ant, get an OOM

2009-11-12 Thread Uwe Schindler
Raise -Xmx, there is a setting in common-build.xml or buidl.xml - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] > Sent: Thursday, November 12, 2009 8:3

Verbose logging via ant, get an OOM

2009-11-12 Thread Jason Rutherglen
Is there a setting to fix this? [junit] Exception in thread "main" java.lang.OutOfMemoryError: Java heap space [junit] at java.util.Arrays.copyOf(Arrays.java:2882) [junit] at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) [junit] at java.lang

Re: OutOfMemoryError when using Sort

2009-11-12 Thread Nuno Seco
Thanks again I am not really sure, why it is enough for you to sort the first 50 highest ranking hits, but if you only want to do this, sorting afterwards is quite straightforward. Just to clarify.. And I know it may seem strange... But I'll mostly be conducting "long" phrase (3 or 4 word)

Re: IndexWriter.close() no longer seems to close everything

2009-11-12 Thread Jason Rutherglen
If there's a bug you're seeing, it's helpful to open an issue and post code reproducing it. On Wed, Nov 11, 2009 at 3:41 AM, Albert Juhe wrote: > > I think that this is the best way to proceed. > > thank you Mike > > > > Michael McCandless-2 wrote: >> >> Can you narrow the leak down to a small se

RE: OutOfMemoryError when using Sort

2009-11-12 Thread Uwe Schindler
I am not really sure, why it is enough for you to sort the first 50 highest ranking hits, but if you only want to do this, sorting afterwards is quite straightforward. Another idea is to not index the count itself, but more use the count as a boost factor for each document. The ranking algorithm o

Re: OutOfMemoryError when using Sort

2009-11-12 Thread Jake Mannix
It is only sorting the top 50 hits, yes, but do do that, it needs to look at the *value* of the field for each and every of the billions of documents. You can do this without using memory if you're willing to deal with disk seeks, but doing billions of those are going to mean that this query most

Re: OutOfMemoryError when using Sort

2009-11-12 Thread Nuno Seco
Ok. Thanks. The doc. says: "Finds the top |n| hits for |query|, applying |filter| if non-null, and sorting the hits by the criteria in |sort|." I understood that only the hits (50 in this) for the current search would be sorted... I'll just do the ordering afterwards. Thank you for clarifyin

Using a precompiled index with a WAR archive

2009-11-12 Thread Nathan Howard
I'm trying to use a precompiled Lucene index from within a WAR archive, and was having difficulty, but found a possible solution: http://mail-archives.apache.org/mod_mbox/lucene-java-user/200305.mbox/%3c20030524152100.28075.qm...@web12707.mail.yahoo.com%3e The gotcha to the solution: it's written

Re: OutOfMemoryError when using Sort

2009-11-12 Thread Jake Mannix
Sorting utilizes a FieldCache: the forward lookup - the value a document has for a particular field (as opposed to the usual "inverted" way of looking at all documents which contains a given term), which lives in memory, and takes up as much space as one 4-bytes * numDocs. If you've indexed the en

RE: OutOfMemoryError when using Sort

2009-11-12 Thread Uwe Schindler
To sort on the count field must be indexed (but not tokenized), it does not need to be stored. But In any case, sort needs lots of memory. How many documents do you have? Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original M

Re: IndexWriter.close() no longer seems to close everything

2009-11-12 Thread John Wang
If you run the zoie test turned to nrt, you can replicate it rather easily: While the test is running, do lsof on your process, e.g. lsof -p | | wc -John On Thu, Nov 12, 2009 at 8:24 AM, John Wang wrote: > Well, I have code in the finally block to call IndexReader.close for every > reader I

Re: IndexWriter.close() no longer seems to close everything

2009-11-12 Thread John Wang
Well, I have code in the finally block to call IndexReader.close for every reader I get from IndexWriter.getReader. On Mon, Nov 9, 2009 at 2:43 AM, Michael McCandless < luc...@mikemccandless.com> wrote: > Does this look like a real leak John? You're definitely closing every > reader you get back

OutOfMemoryError when using Sort

2009-11-12 Thread Nuno Seco
Hello List. I'm having a problem when I add a Sort object to my searcher: docs = searcher.search(parser.parse(search), null, 50, sort); Every time I execute a query I get an OutOfMemoryError exception. But if I execute the query without the Sort object it works fine Let me briefly explain ho

How to read the index in terms order

2009-11-12 Thread Jean-Claude Dauphin
Dear all, I am pretty sure it's trivial and I apologize for raising this issue. I wish to access the index in the order driven by: Term+"Field name"+Frequency or Frequency+Term+"Field Name". I read the terms in the order driven by "Field name"+Term+°Frequency as follow: Directory fsd =

Re: Wrapping IndexSearcher so that it is safe?

2009-11-12 Thread Michael McCandless
On Wed, Nov 11, 2009 at 7:33 PM, Jacob Rhoden wrote: > The source code for SearcherManager is even downloadable for free: >   http://www.manning.com/hatcher3/LIAsourcecode.zip > > The example source code does some things that is beyond my level of > understanding > of lucene. ie: > 1) To me it loo