Re: BufferedUpdateStreams breaks high performance indexing

2016-08-04 Thread Michael McCandless
Wonderful, thanks for bringing closure! Mike McCandless http://blog.mikemccandless.com On Thu, Aug 4, 2016 at 3:14 AM, Bernd Fehling < bernd.fehl...@uni-bielefeld.de> wrote: > After updating to version 5.5.3 it looks good now. > Thanks a lot for your help and advise. > > Best regards > Bernd >

Re: BufferedUpdateStreams breaks high performance indexing

2016-08-04 Thread Bernd Fehling
After updating to version 5.5.3 it looks good now. Thanks a lot for your help and advise. Best regards Bernd Am 29.07.2016 um 15:04 schrieb Michael McCandless: > The deleted terms accumulate whenever you use updateDocument(Term, Doc), or > when you do deleteDocuments(Term). > > Deleted queries a

Re: BufferedUpdateStreams breaks high performance indexing

2016-07-29 Thread Michael McCandless
The deleted terms accumulate whenever you use updateDocument(Term, Doc), or when you do deleteDocuments(Term). Deleted queries are when you delete by query, but I don't think DIH would be doing that unless you asked it to ... maybe a Solr user/dev knows better? Mike McCandless http://blog.mikemc

Re: BufferedUpdateStreams breaks high performance indexing

2016-07-29 Thread Bernd Fehling
Yes, with default of 10 it performs very much better. I didn't take into count that DIH uses updateDocument for adding new documents but after thinking about the "why" I assume that this might be because you don't know if a document already exists in the index. Conclusion, using DIH and setting seg

Re: BufferedUpdateStreams breaks high performance indexing

2016-07-28 Thread Michael McCandless
Hmm, your merge policy changes are dangerous: that will cause too many segments in the index, which makes it longer to apply deletes. Can you revert that and re-test? I'm not sure why DIH is using updateDocument instead of addDocument ... maybe ask on the solr-user list? Mike McCandless http://

Re: BufferedUpdateStreams breaks high performance indexing

2016-07-28 Thread Bernd Fehling
Currently I use concurrent DIH but will write some SolrJ for testing or even as replacement for DIH. Don't know whats behind DIH if only documents are added. Not tried any newer release yet, but after reading LUCENE-6161 I really should. At least a version > 5.1 May be before writing some SolrJ.

Re: BufferedUpdateStreams breaks high performance indexing

2016-07-28 Thread Michael McCandless
Hmm not good. If you are really only adding documents, you should be using IndexWriter.addDocument, which won't buffer any deleted terms and that method call should be a no-op. It also makes flushes more efficient since all of your indexing buffer goes to the added documents, not buffered delete

BufferedUpdateStreams breaks high performance indexing

2016-07-28 Thread Bernd Fehling
While trying to get higher performance for indexing it turned out that BufferedUpdateStreams is breaking indexing performance. public synchronized ApplyDeletesResult applyDeletesAndUpdates(...) At IndexWriterConfig I have setRAMBufferSizeMB=1024 and the Lucene 4.10.4 API states: "Determines the a