Hi, On Thu, Jul 18, 2013 at 7:15 AM, Sriram Sankar <san...@gmail.com> wrote: > The approach we have discussed in an earlier thread uses: > > writer.addIndexes(new SortingAtomicReader(...)); > > I want to confirm (this is not absolutely clear to me yet) that the above > call will not create multiple segments - i.e., the output will be optimized.
All the provided readers will be merged into a single segment but if your index already has segments, it will have an additional one. > We are also trying another approach - sorting the documents in Hadoop - so > that we can repeatedly call writer.addDocument(...) providing documents in > the correct order. > > How can we make sure that the final output contains documents in a single > segment and in the order in which they were added? You can ensure that documents stay in the order in which they have been added by using LogByteMergePolicy or LogDocMergePolicy. However, don't use TieredMergePolicy which will happily merge non-adjacent segments. If this is an offline operation, you can just use LogByteMergePolicy, add documents in order and run forceMerge(1) when finished. -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org