Strange. That's all I got from the log beside the first line I wrote to show starting merging with a time stamp.
On Sun, Apr 14, 2013 at 4:58 PM, Robert Muir <rcm...@gmail.com> wrote: > Your stack trace is incomplete: it doesn't even show where the OOM > occurred. > > On Sun, Apr 14, 2013 at 7:48 PM, Wei Wang <welshw...@gmail.com> wrote: > > > Unfortunately, I got another problem. My index has 9 segments (9 dvdd > > files) with total size is about 22GB. The merging step eventually failed > > and I saw an error message: > > > > Exception in thread "main" java.lang.IllegalStateException: this writer > hit > > an OutOfMemoryError; cannot complete forceMerge > > at > > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1664) > > at > > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1610) > > at > > > > > com.ea.eadp.data.aem.audience.indexer.tools.IndexingTool.mergeIndex(IndexingTool.java:196) > > at > > > > > com.ea.eadp.data.aem.audience.indexer.tools.AudienceIndexer.main(AudienceIndexer.java:46) > > Exception in thread "Lucene Merge Thread #0" > > org.apache.lucene.index.MergePolicy$MergeException: > > java.lang.OutOfMemoryError: Java heap space > > at > > > > > org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541) > > at > > > > > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514) > > > > I configured jvm with "-Xmx4096m", and it seems still not enough memory. > I > > thought DiskDocValuesFormat puts most of the data on disk and there > should > > not be that much memory consumption. But it seems not the case. > > > > On Sun, Apr 14, 2013 at 4:13 PM, Wei Wang <welshw...@gmail.com> wrote: > > > > > That makes sense. > > > > > > BTW, I checked the jar file. Exactly as you pointed out, the services > > > files only contains info from lucene-core, without codec from > > > lucene-codecs. After adding the maven plugin, now it is running. > > > > > > Thanks! > > > > > > > > > On Sun, Apr 14, 2013 at 3:26 PM, Uwe Schindler <u...@thetaphi.de> > wrote: > > > > > >> Hi, > > >> > > >> > Thanks for the hint. I will double check the jar file. > > >> > > > >> > I am just a bit puzzled that if the indexing step recognizes 'Disk' > > >> codec and > > >> > creates index properly, the merge step that immediately follows > > indexing > > >> > seems should also recognize the 'Disk' codec. > > >> > > >> This is easy to explain: By creating the custom Lucene42 Codec as a > > >> Class, you just define the disk format on the initial write (when > *new* > > >> segments are written with new documents). While merging (or > > force-merging), > > >> Lucene uses the metadata that’s already on disk for the segments to > > merge. > > >> The metadata on disk contains the names of all codec components used. > > Those > > >> metadata is also used when opening IndexReaders. It will then use SPI > > and > > >> META-INF/services files to look up the class that is responsible for > > e.g. > > >> the "Disk" docvalues format. Without the META-INF data, Lucene cannot > > >> lookup the segment codecs. > > >> > > >> Uwe > > >> > > >> > On Sun, Apr 14, 2013 at 3:03 PM, Uwe Schindler <u...@thetaphi.de> > > wrote: > > >> > > > >> > > Are you sure that you use the ServicesResourceTransformer in your > > >> > > shade config? > > >> > > > > >> > > > > >> > > http://maven.apache.org/plugins/maven-shade- > > >> > plugin/examples/resource-t > > >> > > ransformers.html#ServicesResourceTransformer > > >> > > > > >> > > The problem is: lucene-core.jar and lucene-codecs.jar both contain > > >> > > codec components and their classes are listed in > META-INF/services. > > If > > >> > > those files are not correctly merged through this resource > > >> > > transformer, the resulting JAR file will miss some codecs. > > >> > > > > >> > > You can check correctness by opening the final JAR file with a ZIP > > >> > > program and check that all files in META-INF/services contain all > > >> > > entries merged from all Lucene JARs. > > >> > > > > >> > > Uwe > > >> > > > > >> > > ----- > > >> > > Uwe Schindler > > >> > > H.-H.-Meier-Allee 63, D-28213 Bremen > > >> > > http://www.thetaphi.de > > >> > > eMail: u...@thetaphi.de > > >> > > > > >> > > > > >> > > > -----Original Message----- > > >> > > > From: Wei Wang [mailto:welshw...@gmail.com] > > >> > > > Sent: Sunday, April 14, 2013 11:49 PM > > >> > > > To: java-user@lucene.apache.org > > >> > > > Subject: Re: DiskDocValuesFormat > > >> > > > > > >> > > > Yes, I used Maven Shade plugin, but still have this problem. > Here > > is > > >> > > > the Maven output during packaging: > > >> > > > > > >> > > > [INFO] --- maven-shade-plugin:2.0:shade (default) @ > > >> > > > audience-profile- indexer --- [INFO] Including > > >> > > > commons-collections:commons- > > >> > > > collections:jar:3.2.1 in the shaded jar. > > >> > > > [INFO] Including org.mockito:mockito-core:jar:1.9.5 in the > shaded > > >> jar. > > >> > > > [INFO] Including org.hamcrest:hamcrest-core:jar:1.1 in the > shaded > > >> jar. > > >> > > > [INFO] Including org.objenesis:objenesis:jar:1.0 in the shaded > > jar. > > >> > > > [INFO] Including junit:junit:jar:4.11 in the shaded jar. > > >> > > > [INFO] Including log4j:log4j:jar:1.2.17 in the shaded jar. > > >> > > > [INFO] Including org.apache.lucene:lucene-core:jar:4.2.1 in the > > >> > > > shaded > > >> > > jar. > > >> > > > [INFO] Including org.apache.lucene:lucene-queries:jar:4.2.1 in > the > > >> > > > shaded jar. > > >> > > > [INFO] Including org.apache.lucene:lucene-queryparser:jar:4.2.1 > in > > >> > > > the shaded jar. > > >> > > > [INFO] Including org.apache.lucene:lucene-sandbox:jar:4.2.1 in > the > > >> > > > shaded jar. > > >> > > > [INFO] Including jakarta-regexp:jakarta-regexp:jar:1.4 in the > > >> shaded jar. > > >> > > > [INFO] Including > > org.apache.lucene:lucene-analyzers-common:jar:4.2.1 > > >> > > > in the shaded jar. > > >> > > > [INFO] Including org.apache.lucene:lucene-codecs:jar:4.2.1 in > the > > >> > > > shaded > > >> > > jar. > > >> > > > [INFO] Including commons-lang:commons-lang:jar:2.6 in the shaded > > >> jar. > > >> > > > [INFO] Including commons-logging:commons-logging:jar:1.1.1 in > the > > >> > > > shaded jar. > > >> > > > [INFO] Including commons-io:commons-io:jar:2.4 in the shaded > jar. > > >> > > > [INFO] Replacing original artifact with shaded artifact. > > >> > > > > > >> > > > On Sun, Apr 14, 2013 at 2:40 PM, Uwe Schindler <u...@thetaphi.de > > > > >> > wrote: > > >> > > > > > >> > > > > If you create a single JAR file out of multiple Lucene JAR > files > > >> > > > > use a tool like Maven Shade plugin, otherwise, required > metadata > > >> > > > > propreties > > >> > > > > (META-INF/services) files in the JAR files are not correctly > > >> > > > > merged together. > > >> > > > > > > >> > > > > ----- > > >> > > > > Uwe Schindler > > >> > > > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de > > >> > > > > eMail: u...@thetaphi.de > > >> > > > > > > >> > > > > > > >> > > > > > -----Original Message----- > > >> > > > > > From: Wei Wang [mailto:welshw...@gmail.com] > > >> > > > > > Sent: Sunday, April 14, 2013 11:30 PM > > >> > > > > > To: java-user@lucene.apache.org > > >> > > > > > Subject: Re: DiskDocValuesFormat > > >> > > > > > > > >> > > > > > Hi Adrien, > > >> > > > > > > > >> > > > > > The Lucene42Codec works well to generate the index with > > >> > > > > > DiskDocValuesFormat. But when I tried to merge the index > > >> segments > > >> > by > > >> > > > > > calling: > > >> > > > > > > > >> > > > > > IndexWriter iw = new IndexWriter(directory, iw_config); ... > > >> > > > > > iw.forceMerge(1); > > >> > > > > > > > >> > > > > > I got the following error message: > > >> > > > > > > > >> > > > > > Caused by: java.lang.IllegalArgumentException: A SPI class > of > > >> type > > >> > > > > > org.apache.lucene.codecs.DocValuesFormat with name 'Disk' > does > > >> > not > > >> > > > exist. > > >> > > > > > You need to add the corresponding JAR file supporting this > SPI > > >> to > > >> > > > > > your classpath.The current classpath supports the following > > >> names: > > >> > > > > > [Lucene42] > > >> > > > > > > > >> > > > > > Any hint on this classpath problem? I have created a single > > jar > > >> file > > >> > > > > that has all > > >> > > > > > necessary dependencies, such as lucene-codecs-4.2.0.jar. > And I > > >> > > > > > assume the indexing step works well, so Lucene already knows > > the > > >> > > > > > format with name 'Disk'. > > >> > > > > > > > >> > > > > > Thanks. > > >> > > > > > > > >> > > > > > On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand < > > >> jpou...@gmail.com> > > >> > > > wrote: > > >> > > > > > > > >> > > > > > > Hi Wei, > > >> > > > > > > > > >> > > > > > > On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang > > >> > <welshw...@gmail.com> > > >> > > > > > wrote: > > >> > > > > > > > I am trying to use DiskDocValuesFormat for a particular > > >> > > > > > > > BinaryDocValuesField. It seems there is no good examples > > >> > showing > > >> > > > > > > > how to > > >> > > > > > > do > > >> > > > > > > > this. The only hint I got from various docs and forums > is > > >> set > > >> > > > > > > > some codec > > >> > > > > > > in > > >> > > > > > > > IndexWriter. Could someone give a few lines of code > > snippet > > >> and > > >> > > > > > > > show how > > >> > > > > > > to > > >> > > > > > > > set DiskDocValuesFormat? > > >> > > > > > > > > >> > > > > > > Lucene42Codec can be extended to specify the doc values > > format > > >> > to > > >> > > > > > > use on a per-field basis. For example: > > >> > > > > > > > > >> > > > > > > final Codec codec = new Lucene42Codec() { > > >> > > > > > > final Lucene42DocValuesFormat memoryDVFormat = new > > >> > > > > > > Lucene42DocValuesFormat(); > > >> > > > > > > final DiskDocValuesFormat diskDVFormat = new > > >> > > > DiskDocValuesFormat(); > > >> > > > > > > @Override > > >> > > > > > > public DocValuesFormat getDocValuesFormatForField(String > > >> field) > > >> > { > > >> > > > > > > if ("dv_mem".equals(field)) { > > >> > > > > > > // use Lucene42 for "dv_mem" > > >> > > > > > > return memoryDVFormat; > > >> > > > > > > } else { > > >> > > > > > > // use Disk otherwise > > >> > > > > > > return diskDVFormat; > > >> > > > > > > } > > >> > > > > > > } > > >> > > > > > > }; > > >> > > > > > > > > >> > > > > > > Then just pass this Codec instance to your > > IndexWriterConfig. > > >> > > > > > > > > >> > > > > > > -- > > >> > > > > > > Adrien > > >> > > > > > > > > >> > > > > > > > > >> ------------------------------------------------------------------ > > >> > > > > > > --- To unsubscribe, e-mail: > > >> > > > > > > java-user-unsubscr...@lucene.apache.org > > >> > > > > > > For additional commands, e-mail: java-user- > > >> > h...@lucene.apache.org > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> --------------------------------------------------------------------- > > >> > > > > To unsubscribe, e-mail: > java-user-unsubscr...@lucene.apache.org > > >> > > > > For additional commands, e-mail: > > java-user-h...@lucene.apache.org > > >> > > > > > > >> > > > > > > >> > > > > >> > > > > >> > > > > --------------------------------------------------------------------- > > >> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > >> > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > >> > > > > >> > > > > >> > > >> > > >> --------------------------------------------------------------------- > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > >> > > >> > > > > > >