On Wed, Dec 15, 2010 at 1:41 PM, Chris Hostetter
wrote:
> files with the same names should be the same, files with differnet names
> should be very different -- but if your binary diff tool is finding
> commonalities between files in new segments as the index grows overtime,
> and you feel like yo
: In my testing, when the filenames are the same, doing an xdelta on the
: files (mainly the file that contains most of the data, the .cfs file),
: there is a significant reduction in the size of the patch file created.
AS noted elsewhere in this thread, the filenames themselves are
significant
On Wed, Dec 15, 2010 at 7:49 AM, Doron Cohen wrote:
> Perhaps I'll change my mind after understanding the scenario that creates
> this, but for now I'd rather not to ignore the file names differences.
It may be possible to control the data generation process, so
the filenames are consistent. Chan
> I could make an exception in the patch creation program to detect
> that there is a lucene directly, and diff the .cfs files, even if
> they have different names, but was seeing if I can avoid that
> so the patch program can be agnostic about the contents of the
> directory tree.
>
Doing only th
On Tue, Dec 14, 2010 at 9:45 AM, Erick Erickson wrote:
> Lucene never changes an existing segments file once it is committed.
> It only merges segments then deletes the old ones. So if the file names
> are different, then it seems that renaming them wouldn't be what you
> really want.
>
> So eithe
I'm missing something here. You mention "two versions of
a data set in a directory tree structure". The Lucene indexes
will have different names if they have been merged. Usually
this is a result of changing the data, issuing an optimize, etc.
That is, the data *is* different so it seems perfectly
On Tue, Dec 14, 2010 at 12:53 AM, Chris Hostetter
wrote:
>
> : It is possible to always have Lucene end up with the
> : same set of index filenames for each index generation
> : process?
>
> this smells like an XY problem why do you car what the file names
> are? that's an implementtaion deta
: It is possible to always have Lucene end up with the
: same set of index filenames for each index generation
: process?
this smells like an XY problem why do you car what the file names
are? that's an implementtaion detail of lucene -- the directory as a whole
is the index -- what are yo
It is possible to always have Lucene end up with the
same set of index filenames for each index generation
process?
I have an application that creates an index for a
set of files, and generally, the index files created
are the following:
_0.cfs segments_2 segments.gen
However, it appears somet