Well, assuming you can get all the information you need out of your index, you really only have two choices that I see. 1> iterate through your documents and delete and re-add each document to that same index. 2> iterate through your documents and add the doc to a *new* index, then replace your old index with the new one.
It should be straight forward to do either, you just have to do something like IndexReader.document(#) where # is the internal Lucene doc ID. You'd have to know ahead of time what the largest existing doc ID is, but that's not hard.... But not this from the javadoc Note that fields which are *not* stored<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/document/Field.html#isStored%28%29>are *not* available in documents retrieved from the index, e.g. with Hits.doc(int)<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/search/Hits.html#doc%28int%29>, Searcher.doc(int)<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/search/Searcher.html#doc%28int%29>or IndexReader.document(int)<file:///C:/lucene-2.0.0/docs/api/org/apache/lucene/index/IndexReader.html#document%28int%29> . Given the size of your indexes, I don't think that there's much to be gained by trying to delete from/add to the same index, I'd just create a new one. If you can't get everything you need from the index, I'm afraid you're out of luck. I might suggest that if you *do* have to go through the pain of hitting your remote resources, you store the raw data locally in case you need to do this again in the future. I know that's not much help, but.... Or, figure out how to make Lucene update-in-place, write the code, test it and submit a patch. I'm sure Erik, Otis et.al. would offer you profuse thanks <G> good luck Erick On 8/29/06, Xiaocheng Luan <[EMAIL PROTECTED]> wrote:
Thanks, Erick. I agree that it might be unlikely to reconstruct from an existing index, but I think document boosting (that is, one document has a higher boost factor than other documents) as well as field boosting is specified during indexing. Our use case is performancce/results tuning. We have huge indexes (in the range of dozens of GBs) and some sources are remote. I'm trying to figure out ways to avoid re-ingesting the contents as much as possible. Any suggestions? Thanks. X. Erick Erickson <[EMAIL PROTECTED]> wrote: A couple of things.. 1> I don't think you set the boost when indexing. You set the boost when querying, so you don't need to re-index for boosting. 2> A recurring theme is that you can't do an update-in-place for a lucene document. You might search the mail archive for a discussion of this. The short form is that if you want to change every document, you're probably better off re-indexing the whole thing. If, for some reason you can't/don't want to just re-index it all, then be aware that if you didn't store the fields for the documents (i.e. use Field.Store.YES), then you really can't reconstruct the document from the index without potentially losing information. Hope this helps Erick On 8/29/06, Xiaocheng Luan wrote: > > Hi, > Got a question. Here is what I want to achieve: > > Create a new index from an existing index, to change the boosting factor > for some of the documents (and potentially some other tweaks), without > reindexing it from the source. > > Is there any tools or ways to do this? > Thanks! > Xiaocheng Luan > > > --------------------------------- > Get your own web address for just $1.99/1st yr. We'll help. Yahoo! Small > Business. > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com