Re: Creating a new index from an existing index

Erick Erickson Wed, 30 Aug 2006 09:50:32 -0700

Well, assuming you can get all the information you need out of your index,
you really only have two choices that I see.
1> iterate through your documents and delete and re-add each document to
that same index.
2> iterate through your documents and add the doc to a *new* index, then
replace your old index with the new one.


It should be straight forward to do either, you just have to do something
like IndexReader.document(#) where # is the internal Lucene doc ID. You'd
have to know ahead of time what the largest existing doc ID is, but that's
not hard....

But not this from the javadoc

Note that fields which are *not*
stored<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/document/Field.html#isStored%28%29>are
*not* available in documents retrieved from the index, e.g. with
Hits.doc(int)<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/search/Hits.html#doc%28int%29>,
Searcher.doc(int)<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/search/Searcher.html#doc%28int%29>or
IndexReader.document(int)<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/index/IndexReader.html#document%28int%29>
.

Given the size of your indexes, I don't think that there's much to be gained
by trying to delete from/add to the same index, I'd just create a new one.

If you can't get everything you need from the index, I'm afraid you're out
of luck. I might suggest that if you *do* have to go through the pain of
hitting your remote resources, you store the raw data locally in case you
need to do this again in the future. I know that's not much help, but....

Or, figure out how to make Lucene update-in-place, write the code, test it
and submit a patch. I'm sure Erik, Otis et.al. would offer you profuse
thanks <G>

good luck
Erick

On 8/29/06, Xiaocheng Luan <[EMAIL PROTECTED]> wrote:

Thanks, Erick.
I agree that it might be unlikely to reconstruct from an existing index,
but I think document boosting (that is, one document has a higher boost
factor than other documents) as well as field boosting is specified during
indexing.

Our use case is performancce/results tuning. We have huge indexes (in the
range of dozens of GBs) and some sources are remote. I'm trying to figure
out ways to avoid re-ingesting the contents as much as possible.
Any suggestions?
Thanks.
X.

Erick Erickson <[EMAIL PROTECTED]> wrote: A couple of things..

1> I don't think you set the boost when indexing. You set the boost when
querying, so you don't need to re-index for boosting.

2> A recurring theme is that you can't do an update-in-place for a lucene
document. You might search the mail archive for a discussion of this. The
short form is that if you want to change every document, you're probably
better off re-indexing the whole thing. If, for some reason you
can't/don't
want to just re-index it all, then be aware that if you didn't store the
fields for the documents (i.e. use Field.Store.YES), then you really can't
reconstruct the document from the index without potentially losing
information.

Hope this helps
Erick

On 8/29/06, Xiaocheng Luan  wrote:
>
> Hi,
> Got a question. Here is what I want to achieve:
>
> Create a new index from an existing index, to change the boosting factor
> for some of the documents (and potentially some other tweaks), without
> reindexing it from the source.
>
> Is there any tools or ways to do this?
> Thanks!
> Xiaocheng Luan
>
>
> ---------------------------------
> Get your own web address for just $1.99/1st yr. We'll help. Yahoo! Small
> Business.
>

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: Creating a new index from an existing index

Reply via email to