Hi Erick,

Thank you.

Deleting old files is fine (and expected), so it sounds like the segment
files are immutable (prior to deletion) and the file that handles deletion
is renamed with every change, so it's effectively immutable, too.

That leaves the segments_* files and segments.gen, if I understand
correctly.

And thank you for the pointer. I'm hoping to use the same process to backup
and restore all my data (Lucene and otherwise), and to be able to use an
incremental approach so that the system doesn't need to be offline too
long, but I'll definitely take another look at snapshots.

Thanks again


On Sat, Sep 12, 2015 at 12:50 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> The Lucene index segment files are immutable, once they're closed,
> they are never changed. These are things like _1.fdt, _1.tim, etc. All
> of the files with the same prefix (_1 in my example) comprise a single
> "segment". Segments _will_, however, disappear. During indexing, two
> or more segment are combined into a new segment, so _1.*, _2.* and
> _3.* could be copied to _4.* then _1.*, _2.* and _3.* will be removed.
>
> There is one exception to the rule "segment files are not changed",
> and that's the file that contains information about documents in that
> segment that have been deleted. Actually that file is re-written to a
> new name every time a doc is deleted from the segment upon commit.
>
> And another exception is that there is a file or two that contains the
> information about what segments comprise the most recent (hard)
> commit, in 4x segments_* and segments.gen.
>
> So rather than try to wrap your head around all this and then worry
> about what changes when the next major release comes out, would it
> work to just use the built-in snapshot process? Here's something I
> found (but didn't look at very closely) to get you started:
>
> http://stackoverflow.com/questions/17753226/lucene-4-3-1-backup-process
>
> And there's a link to the Lucene user's list where the question was
> answered..
>
> Best,
> Erick
>
> On Sat, Sep 12, 2015 at 7:59 AM, Larry White <lwh...@tracelink.com> wrote:
> > Hi,
> >
> > I'm writing a backup routine for a system that includes Lucene for
> > full-text search. The primary data store is based on immutable files, so
> it
> > can be backed-up incrementally by copying any new files (and removing any
> > files that have been deleted from earlier backups). It's my understanding
> > from brief comments found on the internet that most, if not all the files
> > that comprise a Lucene index are similarly immutable.
> >
> > Can someone please confirm or deny that statement?
> >
> > If the Lucene files are mostly, but not entirely, immutable, it would be
> > greatly appreciated if the exceptions could be identified. I would
> imagine
> > there might be log files that would be mutable, for example.
> >
> > Thank you very much for your help.
> >
> > Larry
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


-- 
*Larry White |  TraceLink Inc. | Principal Software Architect*
400 Riverpark Dr. | North Reading, MA | 01864
e: lwh...@tracelink.com
www.tracelink.com


*Protect patients, enable health, grow profits, ensure compliance*

Reply via email to