Re: Forcing specific index file names

2010-12-15 Thread Earl Hood
On Wed, Dec 15, 2010 at 1:41 PM, Chris Hostetter wrote: > files with the same names should be the same, files with differnet names > should be very different -- but if your binary diff tool is finding > commonalities between files in new segments as the index grows overtime, > and you feel like yo

Re: Forcing specific index file names

2010-12-15 Thread Chris Hostetter
: In my testing, when the filenames are the same, doing an xdelta on the : files (mainly the file that contains most of the data, the .cfs file), : there is a significant reduction in the size of the patch file created. AS noted elsewhere in this thread, the filenames themselves are significant

Re: Forcing specific index file names

2010-12-15 Thread Earl Hood
On Wed, Dec 15, 2010 at 7:49 AM, Doron Cohen wrote: > Perhaps I'll change my mind after understanding the scenario that creates > this, but for now I'd rather not to ignore the file names differences. It may be possible to control the data generation process, so the filenames are consistent. Chan

Re: Forcing specific index file names

2010-12-15 Thread Doron Cohen
> I could make an exception in the patch creation program to detect > that there is a lucene directly, and diff the .cfs files, even if > they have different names, but was seeing if I can avoid that > so the patch program can be agnostic about the contents of the > directory tree. > Doing only th

Re: Forcing specific index file names

2010-12-14 Thread Earl Hood
On Tue, Dec 14, 2010 at 9:45 AM, Erick Erickson wrote: > Lucene never changes an existing segments file once it is committed. > It only merges segments then deletes the old ones. So if the file names > are different, then it seems that renaming them wouldn't be what you > really want. > > So eithe

Re: Forcing specific index file names

2010-12-14 Thread Erick Erickson
I'm missing something here. You mention "two versions of a data set in a directory tree structure". The Lucene indexes will have different names if they have been merged. Usually this is a result of changing the data, issuing an optimize, etc. That is, the data *is* different so it seems perfectly

Re: Forcing specific index file names

2010-12-14 Thread Earl Hood
On Tue, Dec 14, 2010 at 12:53 AM, Chris Hostetter wrote: > > : It is possible to always have Lucene end up with the > : same set of index filenames for each index generation > : process? > > this smells like an XY problem why do you car what the file names > are? that's an implementtaion deta

Re: Forcing specific index file names

2010-12-13 Thread Chris Hostetter
: It is possible to always have Lucene end up with the : same set of index filenames for each index generation : process? this smells like an XY problem why do you car what the file names are? that's an implementtaion detail of lucene -- the directory as a whole is the index -- what are yo

Forcing specific index file names

2010-12-13 Thread Earl Hood
It is possible to always have Lucene end up with the same set of index filenames for each index generation process? I have an application that creates an index for a set of files, and generally, the index files created are the following: _0.cfs segments_2 segments.gen However, it appears somet