Hello Jose, Thank you for your response, I took a closer look. Below are my responses:
> Seems that you want to force a max number of segments to 1, // you're done adding documents to it): // writer.forceMerge(1); writer.close(); Yes, the line of code is uncommented because we want to understand how it work when index big data sets. Should this be a concern? > On a previous thread someone answered that the number of segments will > affect the Index Size, and is not related with Index Integrity (like size > of index may vary according with number of segments). okay, no idea what the above actually mean but I would guess perhaps the code we added, cause this exception? if (file.isDirectory()) { String[] files = file.list(); // an IO error could occur if (files != null) { for (int i = 0; i < files.length; i++) { indexDocs(writer, new File(file, files[i]), forceMerge); if (forceMerge && writer.hasPendingMerges()) { if (i % 1000 == 0 && i != 0) { logger.trace("forcing merge now."); try { writer.forceMerge(50); writer.commit(); } catch (OutOfMemoryError e) { logger.error("out of memory during merging ", e); throw new OutOfMemoryError(e.toString()); } } } } } } else { FileInputStream fis; > Should be... > Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_46); > IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_46, > analyzer); yes, we were and still referencing lucene_46 in our analyzer. /Jason On Sat, Apr 5, 2014 at 9:01 PM, Jose Carlos Canova < jose.carlos.can...@gmail.com> wrote: > Seems that you want to force a max number of segments to 1, > On a previous thread someone answered that the number of segments will > affect the Index Size, and is not related with Index Integrity (like size > of index may vary according with number of segments). > > on version 4.6 there is a small issue on sample that is > > Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40); > IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_40, > analyzer); > > > Should be... > > > Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_46); > IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_46, > analyzer); > > > With this probably the line related to the codec will change too. > > > > On Fri, Apr 4, 2014 at 3:52 AM, Jason Wee <peich...@gmail.com> wrote: > > > Hello again, > > > > A little background of our experiment. We are storing lucene (version > > 4.6.0) on top of cassandra. We are using the demo IndexFiles.java from > the > > lucene with minor modification such that the directory used is reference > to > > the CassandraDirectory. > > > > With large dataset (that is, index more than 50000 of files), after index > > is done, and set forceMerge(1) and get the following exception. > > > > > > BufferedIndexInput readBytes [ERROR] bufferStart = '0' bufferPosition = > > '1024' len = '9252' after = '10276' > > BufferedIndexInput readBytes [ERROR] length = '8192' > > caught a class java.io.IOException > > with message: background merge hit exception: _1(4.6):c10250 > > _0(4.6):c10355 _2(4.6):c10297 _3(4.6):c10217 _4(4.6):c8882 into _5 > > [maxNumSegments=1] > > java.io.IOException: background merge hit exception: _1(4.6):c10250 > > _0(4.6):c10355 _2(4.6):c10297 _3(4.6):c10217 _4(4.6):c8882 into _5 > > [maxNumSegments=1] > > at > > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1755) > > at > > org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1691) > > at org.apache.lucene.store.IndexFiles.main(IndexFiles.java:159) > > Caused by: java.io.IOException: read past EOF: > > CassandraSimpleFSIndexInput(_1.nvd in path="_1.cfs" > slice=5557885:5566077) > > at > > > > > org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:186) > > at > > > > > org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:125) > > at > > > > > org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:230) > > at > > > > > org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:186) > > at > > > > > org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:159) > > at > > > org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:516) > > at > > org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:232) > > at > > org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:127) > > at > > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4057) > > at > org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3654) > > at > > > > > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) > > at > > > > > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) > > > > > > We do not know what is wrong as our understanding on lucene is limited. > Can > > someone give explanation on what is happening, or which might be the > > possible error source is? > > > > Thank you and any advice is appreciated. > > > > /Jason > > >