G1 and CMS are both tuned primarily for low pauses which is typically prefered for searching an index. In this case i guess that indexing throughput is prefered in which case using ParallelGC might be the better choice.

Am 23.11.2013 17:15, schrieb Uwe Schindler:
Hi,

Maybe your heap size is just too big, so your JVM spends too much time in GC? The setup 
you described in your last eMail ist the "official supported" setup :-) Lucene 
has no problem with that setup and can index. Be sure:
- Don't give too much heap to your indexing app. Larger heaps create much more 
GC load.
- Use a suitable Garbage collector (e.g. Java 7 G1 Collector or Java 6 CMS Collector). 
Other garbage collectors may do GCs in a single thread ("stop-the-world").

Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


-----Original Message-----
From: Igor Shalyminov [mailto:ishalymi...@yandex-team.ru]
Sent: Saturday, November 23, 2013 4:46 PM
To: java-user@lucene.apache.org
Subject: Re: Lucene multithreaded indexing problems

So we return to the initially described setup: multiple parallel workers, each
making "parse + indexWriter.addDocument()" for single documents with no
synchronization at my side. This setup was also bad on memory consumption
and thread blocking, as I reported.

Or did I misunderstand you?

--
Igor

22.11.2013, 23:34, "Uwe Schindler" <u...@thetaphi.de>:
Hi,
Don't use addDocuments. This method is more made for so called block
indexing (where all documents need to be on a block for block joins). Call
addDocument for each document possibly from many threads.  By this
Lucene can better handle multithreading and free memory early. There is
really no need to use bulk adds, this is solely for block joins, where docs need
to be sequential and without gaps.
Uwe

Igor Shalyminov <ishalymi...@yandex-team.ru> schrieb:

- uwe@

Thanks Uwe!

I changed the logic so that my workers only parse input docs into
Documents, and indexWriter does addDocuments() by itself for the
chunks of 100 Documents.
Unfortunately, this behaviour reproduces: memory usage slightly
increases with the number of processed documents, and at some point
the program runs very slowly, and it seems that only a single thread
is active.
It happens after lots of parse/index cycles.

The current instance is now in the "single-thread" phase with ~100%
CPU and with 8397M RES memory (limit for the VM is -Xmx8G).
My question is, when does addDocuments() release all resourses passed
in (the Documents themselves)?
Are the resourses released after finishing the function call, or I
have to do indexWriter.commit() after, say, each chunk?

--
Igor

21.11.2013, 19:59, "Uwe Schindler" <u...@thetaphi.de>:
  Hi,

  why are you doing this? Lucene's IndexWriter can handle
addDocuments
in multiple threads. And, since Lucene 4, it will process them almost
completely parallel!
  If you do the addDocuments single-threaded you are adding an
additional bottleneck in your application. If you are doing a
synchronization on IndexWriter (which I hope you will not do), things
will go wrong, too.
  Uwe

  -----
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
   -----Original Message-----
   From: Igor Shalyminov [mailto:ishalymi...@yandex-team.ru]
   Sent: Thursday, November 21, 2013 4:45 PM
   To: java-user@lucene.apache.org
   Subject: Lucene multithreaded indexing problems

   Hello!

   I tried to perform indexing multithreadedly, with a
FixedThreadPool
of
   Callable workers.
   The main operation - parsing a single document and addDocument()
to
the
   index - is done by a single worker.
   After parsing a document, a lot (really a lot) of Strings
appears,
and at the
   end of the worker's call() all of them goes to the indexWriter.
   I use no merging, the resourses are flushed on disk when the
segment size
   limit is reached.

   The problem is, after a little while (when the most of the heap
memory is
   used) indexer makes no progress, and CPU load is constant 100%
(no
   difference if there are 2 threads or 32). So I think at some
point
garbage
   collection takes the whole indexing process down.

   Could you please give some advices on the proper concurrent
indexing with
   Lucene?
   Can there be "memory leaks" somewhere in the indexWriter? Maybe
I
must
   perform some operations with writer to release unused resourses
from time
   to time?

   --
   Best Regards,
   Igor
---------------------------------------------------------------------
   To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
   For additional commands, e-mail: java-user-h...@lucene.apache.org
--------------------------------------------------------------------
-
  To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to