Michael,

Our application includes indexing and archiving documents to meet
compliance requirements.

A couple of reasons that lead to the merge approach:

- Source documents are written to archive media and retrieval is
 relatively slow. Add to that our processing pipeline (including
 text extraction)... Retrieving and merging minis is faster than
 re-processing and re-indexing from sources.

- In addition to index recovery, mini indexes may be combined into
 custom indexes based on policy.

 From a compliance viewpoint the mini indexes contain logically
 related documents. For example: based on a retention policy,
 documents of type x are to be kept for y years.

 One example for constructing a custom index would be for legal
 discovery.

Thanks, david.

On 4/18/07, Michael D. Curtin <[EMAIL PROTECTED]> wrote:
d m wrote:

> I'd like to share index merge performance data and have a couple
> of questions about it...
>
> We (AXS-One, www.axsone.com) build one "master" index per day.
> For backup and recovery purposes, we also build many individual
> "mini" indexes from the docs added to the master index.
>
> Should one of our master indexes become unusable (for whatever
> reason - and I'm glad to say this has not yet happened), we plan to
> reconstruct it by merging its mini indexes.

The possible merge bug notwithstanding, let's take a step back in
abstraction:  are you sure the relatively-complex iterative merge
process you've described buys you anything over a simple
backup-the-whole-index approach?  Or a
backup-the-source-data-and-reindex approach?

Merging is I/O intensive, and the scheme you've outlined is re-reading
and re-writing all the index data several times anyway -- it might not
be saving you much over a full reindex.  Since the scenario you're
trying to protect against is a very rare occurrence (so far at least),
would it be better to spend your development time on improving the
application than devising (and debugging, and testing, ...) a
complicated backup and recovery scheme?

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to