RE: Lucene multithreaded indexing problems

2013-11-26 Thread Uwe Schindler
tor (e.g. Java 7 G1 Collector or > >>>> Java 6 > CMS > >> Collector). Other garbage collectors may do GCs in a single thread > ("stop-the- > >> world"). > >>>> Uwe > >>>> - > >>>> U

Re: Lucene multithreaded indexing problems

2013-11-26 Thread Igor Shalyminov
ollector (e.g. Java 7 G1 Collector or Java >>>> 6 CMS >> Collector). Other garbage collectors may do GCs in a single thread >> ("stop-the- >> world"). >>>>  Uwe >>>>  - >>>>  Uwe Schindl

RE: Lucene multithreaded indexing problems

2013-11-25 Thread Uwe Schindler
gt;> http://www.thetaphi.de > >> eMail: u...@thetaphi.de > >>> -Original Message- > >>> From: Igor Shalyminov [mailto:ishalymi...@yandex-team.ru] > >>> Sent: Saturday, November 23, 2013 4:46 PM > >>> To: java-user@luce

Re: Lucene multithreaded indexing problems

2013-11-25 Thread Desidero
ported" setup :-) Lucene has no problem with that setup and can index. >> Be sure: >> >> - Don't give too much heap to your indexing app. Larger heaps create >> much more GC load. >> >> - Use a suitable Garbage collector (e.g. Java 7 G1 Collector or Java

Re: Lucene multithreaded indexing problems

2013-11-25 Thread Desidero
gt;> - > >> Uwe Schindler > >> H.-H.-Meier-Allee 63, D-28213 Bremen > >> http://www.thetaphi.de > >> eMail: u...@thetaphi.de > >>> -Original Message- > >>> From: Igor Shalyminov [mailto:ishalymi...@yandex-team.ru] > &g

Re: Lucene multithreaded indexing problems

2013-11-25 Thread Igor Shalyminov
>>  Uwe >>  - >>  Uwe Schindler >>  H.-H.-Meier-Allee 63, D-28213 Bremen >>  http://www.thetaphi.de >>  eMail: u...@thetaphi.de >>>  -Original Message- >>>  From: Igor Shalyminov [mailto:ishalymi...@yandex-team.ru] >>>  Sent:

Re: Lucene multithreaded indexing problems

2013-11-23 Thread Daniel Penning
riginal Message- From: Igor Shalyminov [mailto:ishalymi...@yandex-team.ru] Sent: Saturday, November 23, 2013 4:46 PM To: java-user@lucene.apache.org Subject: Re: Lucene multithreaded indexing problems So we return to the initially described setup: multiple parallel workers, each making "p

Re: Lucene multithreaded indexing problems

2013-11-23 Thread Daniel Penning
Maybe you should turn on Garbage Collection logging to confirm that you are running into some kind of memory problem. (start JVM with -verbose:gc) If the GC is running very often as soon as your indexing process slows down, i would suggest you to create a heapdump and check what the memory is us

RE: Lucene multithreaded indexing problems

2013-11-23 Thread Uwe Schindler
e eMail: u...@thetaphi.de > -Original Message- > From: Igor Shalyminov [mailto:ishalymi...@yandex-team.ru] > Sent: Saturday, November 23, 2013 4:46 PM > To: java-user@lucene.apache.org > Subject: Re: Lucene multithreaded indexing problems > > So we return to the initially de

Re: Lucene multithreaded indexing problems

2013-11-23 Thread Igor Shalyminov
So we return to the initially described setup: multiple parallel workers, each making "parse + indexWriter.addDocument()" for single documents with no synchronization at my side. This setup was also bad on memory consumption and thread blocking, as I reported. Or did I misunderstand you? -- I

Re: Lucene multithreaded indexing problems

2013-11-22 Thread Uwe Schindler
Hi, Don't use addDocuments. This method is more made for so called block indexing (where all documents need to be on a block for block joins). Call addDocument for each document possibly from many threads. By this Lucene can better handle multithreading and free memory early. There is really no

Re: Lucene multithreaded indexing problems

2013-11-22 Thread Igor Shalyminov
- uwe@ Thanks Uwe! I changed the logic so that my workers only parse input docs into Documents, and indexWriter does addDocuments() by itself for the chunks of 100 Documents. Unfortunately, this behaviour reproduces: memory usage slightly increases with the number of processed documents, and at

RE: Lucene multithreaded indexing problems

2013-11-21 Thread Uwe Schindler
Hi, why are you doing this? Lucene's IndexWriter can handle addDocuments in multiple threads. And, since Lucene 4, it will process them almost completely parallel! If you do the addDocuments single-threaded you are adding an additional bottleneck in your application. If you are doing a synchron