Re: DiskDocValuesFormat

Wei Wang Sun, 14 Apr 2013 17:05:06 -0700

Strange. That's all I got from the log beside the first line I wrote to
show starting merging with a time stamp.


On Sun, Apr 14, 2013 at 4:58 PM, Robert Muir <rcm...@gmail.com> wrote:

> Your stack trace is incomplete: it doesn't even show where the OOM
> occurred.
>
> On Sun, Apr 14, 2013 at 7:48 PM, Wei Wang <welshw...@gmail.com> wrote:
>
> > Unfortunately, I got another problem. My index has 9 segments (9 dvdd
> > files) with total size is about 22GB. The merging step eventually failed
> > and I saw an error message:
> >
> > Exception in thread "main" java.lang.IllegalStateException: this writer
> hit
> > an OutOfMemoryError; cannot complete forceMerge
> >     at
> > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1664)
> >     at
> > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1610)
> >     at
> >
> >
> com.ea.eadp.data.aem.audience.indexer.tools.IndexingTool.mergeIndex(IndexingTool.java:196)
> >     at
> >
> >
> com.ea.eadp.data.aem.audience.indexer.tools.AudienceIndexer.main(AudienceIndexer.java:46)
> > Exception in thread "Lucene Merge Thread #0"
> > org.apache.lucene.index.MergePolicy$MergeException:
> > java.lang.OutOfMemoryError: Java heap space
> >     at
> >
> >
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541)
> >     at
> >
> >
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514)
> >
> > I configured jvm with "-Xmx4096m", and it seems still not enough memory.
> I
> > thought DiskDocValuesFormat puts most of the data on disk and there
> should
> > not be that much memory consumption. But it seems not the case.
> >
> > On Sun, Apr 14, 2013 at 4:13 PM, Wei Wang <welshw...@gmail.com> wrote:
> >
> > > That makes sense.
> > >
> > > BTW, I checked the jar file. Exactly as you pointed out, the services
> > > files only contains info from lucene-core, without codec from
> > > lucene-codecs. After adding the maven plugin, now it is running.
> > >
> > > Thanks!
> > >
> > >
> > > On Sun, Apr 14, 2013 at 3:26 PM, Uwe Schindler <u...@thetaphi.de>
> wrote:
> > >
> > >> Hi,
> > >>
> > >> > Thanks for the hint. I will double check the jar file.
> > >> >
> > >> > I am just a bit puzzled that if the indexing step recognizes 'Disk'
> > >> codec and
> > >> > creates index properly, the merge step that immediately follows
> > indexing
> > >> > seems should also recognize the 'Disk' codec.
> > >>
> > >> This is easy to explain: By creating the custom Lucene42 Codec as a
> > >> Class, you just define the disk format on the initial write (when
> *new*
> > >> segments are written with new documents). While merging (or
> > force-merging),
> > >> Lucene uses the metadata that’s already on disk for the segments to
> > merge.
> > >> The metadata on disk contains the names of all codec components used.
> > Those
> > >> metadata is also used when opening IndexReaders. It will then use SPI
> > and
> > >> META-INF/services files to look up the class that is responsible for
> > e.g.
> > >> the "Disk" docvalues format. Without the META-INF data, Lucene cannot
> > >> lookup the segment codecs.
> > >>
> > >> Uwe
> > >>
> > >> > On Sun, Apr 14, 2013 at 3:03 PM, Uwe Schindler <u...@thetaphi.de>
> > wrote:
> > >> >
> > >> > > Are you sure that you use the ServicesResourceTransformer in your
> > >> > > shade config?
> > >> > >
> > >> > >
> > >> > > http://maven.apache.org/plugins/maven-shade-
> > >> > plugin/examples/resource-t
> > >> > > ransformers.html#ServicesResourceTransformer
> > >> > >
> > >> > > The problem is: lucene-core.jar and lucene-codecs.jar both contain
> > >> > > codec components and their classes are listed in
> META-INF/services.
> > If
> > >> > > those files are not correctly merged through this resource
> > >> > > transformer, the resulting JAR file will miss some codecs.
> > >> > >
> > >> > > You can check correctness by opening the final JAR file with a ZIP
> > >> > > program and check that all files in META-INF/services contain all
> > >> > > entries merged from all Lucene JARs.
> > >> > >
> > >> > > Uwe
> > >> > >
> > >> > > -----
> > >> > > Uwe Schindler
> > >> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > >> > > http://www.thetaphi.de
> > >> > > eMail: u...@thetaphi.de
> > >> > >
> > >> > >
> > >> > > > -----Original Message-----
> > >> > > > From: Wei Wang [mailto:welshw...@gmail.com]
> > >> > > > Sent: Sunday, April 14, 2013 11:49 PM
> > >> > > > To: java-user@lucene.apache.org
> > >> > > > Subject: Re: DiskDocValuesFormat
> > >> > > >
> > >> > > > Yes, I used Maven Shade plugin, but still have this problem.
> Here
> > is
> > >> > > > the Maven output during packaging:
> > >> > > >
> > >> > > > [INFO] --- maven-shade-plugin:2.0:shade (default) @
> > >> > > > audience-profile- indexer --- [INFO] Including
> > >> > > > commons-collections:commons-
> > >> > > > collections:jar:3.2.1 in the shaded jar.
> > >> > > > [INFO] Including org.mockito:mockito-core:jar:1.9.5 in the
> shaded
> > >> jar.
> > >> > > > [INFO] Including org.hamcrest:hamcrest-core:jar:1.1 in the
> shaded
> > >> jar.
> > >> > > > [INFO] Including org.objenesis:objenesis:jar:1.0 in the shaded
> > jar.
> > >> > > > [INFO] Including junit:junit:jar:4.11 in the shaded jar.
> > >> > > > [INFO] Including log4j:log4j:jar:1.2.17 in the shaded jar.
> > >> > > > [INFO] Including org.apache.lucene:lucene-core:jar:4.2.1 in the
> > >> > > > shaded
> > >> > > jar.
> > >> > > > [INFO] Including org.apache.lucene:lucene-queries:jar:4.2.1 in
> the
> > >> > > > shaded jar.
> > >> > > > [INFO] Including org.apache.lucene:lucene-queryparser:jar:4.2.1
> in
> > >> > > > the shaded jar.
> > >> > > > [INFO] Including org.apache.lucene:lucene-sandbox:jar:4.2.1 in
> the
> > >> > > > shaded jar.
> > >> > > > [INFO] Including jakarta-regexp:jakarta-regexp:jar:1.4 in the
> > >> shaded jar.
> > >> > > > [INFO] Including
> > org.apache.lucene:lucene-analyzers-common:jar:4.2.1
> > >> > > > in the shaded jar.
> > >> > > > [INFO] Including org.apache.lucene:lucene-codecs:jar:4.2.1 in
> the
> > >> > > > shaded
> > >> > > jar.
> > >> > > > [INFO] Including commons-lang:commons-lang:jar:2.6 in the shaded
> > >> jar.
> > >> > > > [INFO] Including commons-logging:commons-logging:jar:1.1.1 in
> the
> > >> > > > shaded jar.
> > >> > > > [INFO] Including commons-io:commons-io:jar:2.4 in the shaded
> jar.
> > >> > > > [INFO] Replacing original artifact with shaded artifact.
> > >> > > >
> > >> > > > On Sun, Apr 14, 2013 at 2:40 PM, Uwe Schindler <u...@thetaphi.de
> >
> > >> > wrote:
> > >> > > >
> > >> > > > > If you create a single JAR file out of multiple Lucene JAR
> files
> > >> > > > > use a tool like Maven Shade plugin, otherwise, required
> metadata
> > >> > > > > propreties
> > >> > > > > (META-INF/services) files in the JAR files are not correctly
> > >> > > > > merged together.
> > >> > > > >
> > >> > > > > -----
> > >> > > > > Uwe Schindler
> > >> > > > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > >> > > > > eMail: u...@thetaphi.de
> > >> > > > >
> > >> > > > >
> > >> > > > > > -----Original Message-----
> > >> > > > > > From: Wei Wang [mailto:welshw...@gmail.com]
> > >> > > > > > Sent: Sunday, April 14, 2013 11:30 PM
> > >> > > > > > To: java-user@lucene.apache.org
> > >> > > > > > Subject: Re: DiskDocValuesFormat
> > >> > > > > >
> > >> > > > > > Hi Adrien,
> > >> > > > > >
> > >> > > > > > The Lucene42Codec works well to generate the index with
> > >> > > > > > DiskDocValuesFormat. But when I tried to merge the index
> > >> segments
> > >> > by
> > >> > > > > > calling:
> > >> > > > > >
> > >> > > > > > IndexWriter iw = new IndexWriter(directory, iw_config); ...
> > >> > > > > > iw.forceMerge(1);
> > >> > > > > >
> > >> > > > > > I got the following error message:
> > >> > > > > >
> > >> > > > > > Caused by: java.lang.IllegalArgumentException: A SPI class
> of
> > >> type
> > >> > > > > > org.apache.lucene.codecs.DocValuesFormat with name 'Disk'
> does
> > >> > not
> > >> > > > exist.
> > >> > > > > > You need to add the corresponding JAR file supporting this
> SPI
> > >> to
> > >> > > > > > your classpath.The current classpath supports the following
> > >> names:
> > >> > > > > > [Lucene42]
> > >> > > > > >
> > >> > > > > > Any hint on this classpath problem? I have created a single
> > jar
> > >> file
> > >> > > > > that has all
> > >> > > > > > necessary dependencies, such as lucene-codecs-4.2.0.jar.
> And I
> > >> > > > > > assume the indexing step works well, so Lucene already knows
> > the
> > >> > > > > > format with name 'Disk'.
> > >> > > > > >
> > >> > > > > > Thanks.
> > >> > > > > >
> > >> > > > > > On Sat, Apr 13, 2013 at 4:25 AM, Adrien Grand <
> > >> jpou...@gmail.com>
> > >> > > > wrote:
> > >> > > > > >
> > >> > > > > > > Hi Wei,
> > >> > > > > > >
> > >> > > > > > > On Sat, Apr 13, 2013 at 7:44 AM, Wei Wang
> > >> > <welshw...@gmail.com>
> > >> > > > > > wrote:
> > >> > > > > > > > I am trying to use DiskDocValuesFormat for a particular
> > >> > > > > > > > BinaryDocValuesField. It seems there is no good examples
> > >> > showing
> > >> > > > > > > > how to
> > >> > > > > > > do
> > >> > > > > > > > this. The only hint I got from various docs and forums
> is
> > >> set
> > >> > > > > > > > some codec
> > >> > > > > > > in
> > >> > > > > > > > IndexWriter. Could someone give a few lines of code
> > snippet
> > >> and
> > >> > > > > > > > show how
> > >> > > > > > > to
> > >> > > > > > > > set DiskDocValuesFormat?
> > >> > > > > > >
> > >> > > > > > > Lucene42Codec can be extended to specify the doc values
> > format
> > >> > to
> > >> > > > > > > use on a per-field basis. For example:
> > >> > > > > > >
> > >> > > > > > > final Codec codec = new Lucene42Codec() {
> > >> > > > > > >   final Lucene42DocValuesFormat memoryDVFormat = new
> > >> > > > > > > Lucene42DocValuesFormat();
> > >> > > > > > >   final DiskDocValuesFormat diskDVFormat = new
> > >> > > > DiskDocValuesFormat();
> > >> > > > > > >   @Override
> > >> > > > > > >   public DocValuesFormat getDocValuesFormatForField(String
> > >> field)
> > >> > {
> > >> > > > > > >     if ("dv_mem".equals(field)) {
> > >> > > > > > >       // use Lucene42 for "dv_mem"
> > >> > > > > > >       return memoryDVFormat;
> > >> > > > > > >     } else {
> > >> > > > > > >       // use Disk otherwise
> > >> > > > > > >       return diskDVFormat;
> > >> > > > > > >     }
> > >> > > > > > >   }
> > >> > > > > > > };
> > >> > > > > > >
> > >> > > > > > > Then just pass this Codec instance to your
> > IndexWriterConfig.
> > >> > > > > > >
> > >> > > > > > > --
> > >> > > > > > > Adrien
> > >> > > > > > >
> > >> > > > > > >
> > >> ------------------------------------------------------------------
> > >> > > > > > > --- To unsubscribe, e-mail:
> > >> > > > > > > java-user-unsubscr...@lucene.apache.org
> > >> > > > > > > For additional commands, e-mail: java-user-
> > >> > h...@lucene.apache.org
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> ---------------------------------------------------------------------
> > >> > > > > To unsubscribe, e-mail:
> java-user-unsubscr...@lucene.apache.org
> > >> > > > > For additional commands, e-mail:
> > java-user-h...@lucene.apache.org
> > >> > > > >
> > >> > > > >
> > >> > >
> > >> > >
> > >> > >
> > ---------------------------------------------------------------------
> > >> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > >> > > For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >> > >
> > >> > >
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >>
> > >>
> > >
> >
>

Re: DiskDocValuesFormat

Reply via email to