Re: IndexWriter.deleteDocuments(Term) vs IndexReader.deleteDocuments(Term)

2007-03-15 Thread Michael McCandless
"Antony Bowesman" <[EMAIL PROTECTED]> wrote: > The writer method does not return the number of deleted documents. Is > there a > technical reason why this is not done. > > I am planning to see about converting my batch deletions using > IndexReader to > IndexWriter, but I'm currently using the

IndexWriter.deleteDocuments(Term) vs IndexReader.deleteDocuments(Term)

2007-03-15 Thread Antony Bowesman
The writer method does not return the number of deleted documents. Is there a technical reason why this is not done. I am planning to see about converting my batch deletions using IndexReader to IndexWriter, but I'm currently using the return value to record stats. Does the following give th

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-28 Thread Michael McCandless
Suman Ghosh wrote: The search functionality must be available during the index build. Since a relatively small number of documents are being affected (and also we plan to perform the build during a period of time we know to be relatively quiet from last 2 years site access data) during the buil

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-28 Thread Suman Ghosh
The search functionality must be available during the index build. Since a relatively small number of documents are being affected (and also we plan to perform the build during a period of time we know to be relatively quiet from last 2 years site access data) during the build process, we hope tha

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-28 Thread Michael McCandless
This looks correct to me. It's good you are doing the deletes "in bulk" up front for each batch of documents. So I guess you hit the error (& 5000 segments files) while processing batches of 200 docs (because you then optimize in the end)? Do you search this index while it's building, or, only

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-28 Thread Michael McCandless
Yonik Seeley wrote: Actually, in previous versions of Lucene, it *was* possible to get way too many first level segments because of the wonky logic when the IndexWriter was closed. That has been fixed in the trunk with the new merge policy, and you will never see more than mergeFactor first lev

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-28 Thread Yonik Seeley
On 11/27/06, Michael McCandless <[EMAIL PROTECTED]> wrote: Suman Ghosh wrote: > On 11/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: >> On 11/27/06, Suman Ghosh <[EMAIL PROTECTED]> wrote: >> > Here are the values: >> > >> > mergeFactor=10 >> > maxMergeDocs=10 >> > minMergeDocs=100 >> > >> >

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-28 Thread Suman Ghosh
Mike, Below is the pseudo code of the application. A few implementation points to understand the pseudo-code: - We have a home grown threadpool class that allows us to index multiple documents in parallel. We usually submit 200 jobs to the pool (2-3 worker threads usually for the pool). O

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-27 Thread Suman Ghosh
Mike, I've not tried it yet, but I think the problem can be reproduced. However, it'll take a few hours to reach that threshhold since my code also needs to extract text from some very large PDF documents to store in the index. I'll post the pseudo-code of my code tomorrow. Maybe that'll help poi

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-27 Thread Michael McCandless
Suman Ghosh wrote: On 11/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 11/27/06, Suman Ghosh <[EMAIL PROTECTED]> wrote: > Here are the values: > > mergeFactor=10 > maxMergeDocs=10 > minMergeDocs=100 > > And I see your point. At the time of the crash, I have over 5000 > segments. I'll tr

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-27 Thread Suman Ghosh
Yonik, Thanks for the pointer. I'll try the nightly build once the change is committed. Suman On 11/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 11/27/06, Suman Ghosh <[EMAIL PROTECTED]> wrote: > Here are the values: > > mergeFactor=10 > maxMergeDocs=10 > minMergeDocs=100 > > And I se

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-27 Thread Yonik Seeley
On 11/27/06, Suman Ghosh <[EMAIL PROTECTED]> wrote: Here are the values: mergeFactor=10 maxMergeDocs=10 minMergeDocs=100 And I see your point. At the time of the crash, I have over 5000 segments. I'll try some conservative number and try to rebuild the index. Although I don't see how thos

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-27 Thread Suman Ghosh
Here are the values: mergeFactor=10 maxMergeDocs=10 minMergeDocs=100 And I see your point. At the time of the crash, I have over 5000 segments. I'll try some conservative number and try to rebuild the index. On 11/27/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 11/27/06, Suman Ghosh <[E

Re: StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-27 Thread Yonik Seeley
On 11/27/06, Suman Ghosh <[EMAIL PROTECTED]> wrote: The last line [at org.apache.lucene.index.MultiTermDocs.next(MultiReader.java:349)] repeats another 1010 times before the program crashes. I understand that without the actual index or the documents, it's nearly impossible to narrow down the ca

StackOverflowError while calling IndexReader.deleteDocuments(new Term())

2006-11-27 Thread Suman Ghosh
ith a StackOverflowError while calling indexreader.deleteDocuments(new Term()) method (even for the document that was indexed earlier). Here is the partial stacktrace: Exception in thread "main" java.lang.StackOverflowError at java.lang.ref.Reference.(Reference.java

Re: IndexReader.deleteDocuments

2006-10-15 Thread Yonik Seeley
On 10/16/06, EDMOND KEMOKAI <[EMAIL PROTECTED]> wrote: Can somebody please clarify the intended behaviour of IndexReader.deleteDocuments()? It deletes documents containing the term. The API docs are correct, the demo docs are incorrect if they say otherwise. -Yoni

Re: IndexReader.deleteDocuments

2006-10-15 Thread EDMOND KEMOKAI
Can somebody please clarify the intended behaviour of IndexReader.deleteDocuments()?, between the various documentations and implementations it seems this function is broken. API doc says it should delete docs containing the provided term but instead it deletes all documents not containg the

Re: IndexReader.deleteDocuments

2006-10-14 Thread EDMOND KEMOKAI
rote: The javadoc is right. :) Otis - Original Message From: EDMOND KEMOKAI <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Sunday, October 15, 2006 12:49:21 AM Subject: IndexReader.deleteDocuments Hi guys, I am a newbee so excuse me if this is a repost. From the java

Re: IndexReader.deleteDocuments

2006-10-14 Thread Otis Gospodnetic
The javadoc is right. :) Otis - Original Message From: EDMOND KEMOKAI <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Sunday, October 15, 2006 12:49:21 AM Subject: IndexReader.deleteDocuments Hi guys, I am a newbee so excuse me if this is a repost. From the javadoc it

IndexReader.deleteDocuments

2006-10-14 Thread EDMOND KEMOKAI
Hi guys, I am a newbee so excuse me if this is a repost. From the javadoc it seems Reader.deleteDocuments deletes only documents that have the provided term, but the implementation examples that I have seen and from the behaviour of my own app, deleteDocuments(term) deletes documents that don't ha