Hello Jose,

Thank you for your response, I took a closer look. Below are my responses:


> Seems that you want to force a max number of segments to 1,

      // you're done adding documents to it):
      //
      writer.forceMerge(1);

      writer.close();

Yes, the line of code is uncommented because we want to understand how
it work when index big data sets. Should this be a concern?


> On a previous thread someone answered that the number of segments will
> affect the Index Size, and is not related with Index Integrity (like size
> of index may vary according with number of segments).

okay, no idea what the above actually mean but I would guess perhaps
the code we added, cause this exception?

              if (file.isDirectory()) {
                    String[] files = file.list();
                    // an IO error could occur
                    if (files != null) {
                        for (int i = 0; i < files.length; i++) {
                            indexDocs(writer, new File(file, files[i]),
                                    forceMerge);
                            if (forceMerge && writer.hasPendingMerges()) {
                                if (i % 1000 == 0 && i != 0) {
                                    logger.trace("forcing merge now.");
                                    try {
                                        writer.forceMerge(50);
                                        writer.commit();
                                    } catch (OutOfMemoryError e) {
                                        logger.error("out of memory
during merging ", e);
                                        throw new
OutOfMemoryError(e.toString());
                                    }
                                }
                            }
                        }
                    }

                } else {
                    FileInputStream fis;


> Should be...

> Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_46);
>      IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_46,
> analyzer);

yes, we were and still referencing lucene_46 in our analyzer.


/Jason



On Sat, Apr 5, 2014 at 9:01 PM, Jose Carlos Canova <
jose.carlos.can...@gmail.com> wrote:

> Seems that you want to force a max number of segments to 1,
> On a previous thread someone answered that the number of segments will
> affect the Index Size, and is not related with Index Integrity (like size
> of index may vary according with number of segments).
>
> on version 4.6 there is a small issue on sample that is
>
> Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
>       IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_40,
> analyzer);
>
>
> Should be...
>
>
> Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_46);
>       IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_46,
> analyzer);
>
>
> With this probably the line related to the codec will change too.
>
>
>
> On Fri, Apr 4, 2014 at 3:52 AM, Jason Wee <peich...@gmail.com> wrote:
>
> > Hello again,
> >
> > A little background of our experiment. We are storing lucene (version
> > 4.6.0) on top of cassandra. We are using the demo IndexFiles.java from
> the
> > lucene with minor modification such that the directory used is reference
> to
> > the CassandraDirectory.
> >
> > With large dataset (that is, index more than 50000 of files), after index
> > is done, and set forceMerge(1) and get the following exception.
> >
> >
> > BufferedIndexInput readBytes [ERROR] bufferStart = '0' bufferPosition =
> > '1024' len = '9252' after = '10276'
> > BufferedIndexInput readBytes [ERROR] length = '8192'
> >  caught a class java.io.IOException
> >  with message: background merge hit exception: _1(4.6):c10250
> > _0(4.6):c10355 _2(4.6):c10297 _3(4.6):c10217 _4(4.6):c8882 into _5
> > [maxNumSegments=1]
> > java.io.IOException: background merge hit exception: _1(4.6):c10250
> > _0(4.6):c10355 _2(4.6):c10297 _3(4.6):c10217 _4(4.6):c8882 into _5
> > [maxNumSegments=1]
> >         at
> > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1755)
> >         at
> > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1691)
> >         at org.apache.lucene.store.IndexFiles.main(IndexFiles.java:159)
> > Caused by: java.io.IOException: read past EOF:
> > CassandraSimpleFSIndexInput(_1.nvd in path="_1.cfs"
> slice=5557885:5566077)
> >         at
> >
> >
> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:186)
> >         at
> >
> >
> org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:125)
> >         at
> >
> >
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:230)
> >         at
> >
> >
> org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:186)
> >         at
> >
> >
> org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:159)
> >         at
> >
> org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:516)
> >         at
> > org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:232)
> >         at
> > org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:127)
> >         at
> > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4057)
> >         at
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3654)
> >         at
> >
> >
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
> >         at
> >
> >
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
> >
> >
> > We do not know what is wrong as our understanding on lucene is limited.
> Can
> > someone give explanation on what is happening, or which might be the
> > possible error source is?
> >
> > Thank you and any advice is appreciated.
> >
> > /Jason
> >
>

Reply via email to