Re: Index size for Same DataSet.

Jose Carlos Canova Tue, 25 Mar 2014 05:42:38 -0700

Hi,

Thanks a lot for the clarifying. Will do that (force merge) at end, just to
check if all things at my side (:-)) are doing right.


att.



On Tue, Mar 25, 2014 at 5:41 AM, Uwe Schindler <u...@thetaphi.de> wrote:

> Hi,
>
> The reason for this is multithreaded merging. While indexing, Lucene
> merges segments in a separate threads. As this runs multithreaded, there is
> no strict "order of things". Depending on how fast the disk is or what
> other processes are running in parallel, the merging may proceed fast or
> slower so creating another "index structure", where different segments are
> merged in other combinations, leading to different term dictionary or
> posting list sizes.
>
> If you do a forceMerge(1) at the end (can take very long time), the whole
> index is merged into one segment, which should have the same size for the
> same dataset. Please don't compare file MD5/SHA1, the files will *not* be
> identical, because order of documents may still vary.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> > -----Original Message-----
> > From: Jose Carlos Canova [mailto:jose.carlos.can...@gmail.com]
> > Sent: Tuesday, March 25, 2014 6:36 AM
> > To: java-user@lucene.apache.org
> > Subject: Index size for Same DataSet.
> >
> > Hello,
> >
> > I have a doubt about index size,
> > I am testing a program using Lucene to index some dataset.
> >
> > At the final the result of index size is varying a little, since i
> haven't finished
> > the tests at all, i'm doubt if it is normal the index size vary on size
> among
> > different tests.
> >
> > att.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Index size for Same DataSet.

Reply via email to