Thanks Erick, it looks like we'll have to recreate from the sources ... Erick Erickson <[EMAIL PROTECTED]> wrote: This just in from the thread "*Re-created fields consistently indexed"
*Erik Hatcher replied as below, and believe me, Erik knows waaaaaay more about this than I do ..... On Aug 30, 2006, at 11:07 AM, Jason Polites wrote: > I understand that it is possible to "re-create" fields which are > indexed but > not stored (as is done by Luke), and that this is a lossy process, > however I > am wondering whether the indexed version of this remains consistent. > > That is, if I re-create a non-stored field, then re-index this > field in a > new document, will the indexed (inverted) data be the same as in the > original version? That really all depends. For example, the original analysis process could put in position increment gaps between sentences or overlay terms in same positions, and don't forget about term offsets. Just pulling out the terms without getting those details would be even lossier. I'm not even sure how you could recreate that even if you had all of that position and offset information without some kind of special Analyzer that would set things exactly as they were. I suspect there is some low-level tricks that could be played to do this bypassing an analyzer. Erik* * On 8/30/06, Erick Erickson wrote: > > Well, assuming you can get all the information you need out of your index, > you really only have two choices that I see. > 1> iterate through your documents and delete and re-add each document to > that same index. > 2> iterate through your documents and add the doc to a *new* index, then > replace your old index with the new one. > > It should be straight forward to do either, you just have to do something > like IndexReader.document(#) where # is the internal Lucene doc ID. You'd > have to know ahead of time what the largest existing doc ID is, but that's > not hard.... > > But not this from the javadoc > > Note that fields which are *not* stored are *not* available in documents > retrieved from the index, e.g. with Hits.doc(int), Searcher.doc(int) or > IndexReader.document(int). > > Given the size of your indexes, I don't think that there's much to be > gained by trying to delete from/add to the same index, I'd just create a new > one. > > If you can't get everything you need from the index, I'm afraid you're out > of luck. I might suggest that if you *do* have to go through the pain of > hitting your remote resources, you store the raw data locally in case you > need to do this again in the future. I know that's not much help, but.... > > Or, figure out how to make Lucene update-in-place, write the code, test it > and submit a patch. I'm sure Erik, Otis et.al. would offer you profuse > thanks > > good luck > Erick > > > On 8/29/06, Xiaocheng Luan wrote: > > > > Thanks, Erick. > > I agree that it might be unlikely to reconstruct from an existing index, > > but I think document boosting (that is, one document has a higher boost > > factor than other documents) as well as field boosting is specified during > > indexing. > > > > Our use case is performancce/results tuning. We have huge indexes (in > > the range of dozens of GBs) and some sources are remote. I'm trying to > > figure out ways to avoid re-ingesting the contents as much as possible. > > Any suggestions? > > Thanks. > > X. > > > > Erick Erickson wrote: A couple of things.. > > > > 1> I don't think you set the boost when indexing. You set the boost when > > > > querying, so you don't need to re-index for boosting. > > > > 2> A recurring theme is that you can't do an update-in-place for a > > lucene > > document. You might search the mail archive for a discussion of this. > > The > > short form is that if you want to change every document, you're probably > > better off re-indexing the whole thing. If, for some reason you > > can't/don't > > want to just re-index it all, then be aware that if you didn't store the > > > > fields for the documents (i.e. use Field.Store.YES), then you really > > can't > > reconstruct the document from the index without potentially losing > > information. > > > > Hope this helps > > Erick > > > > On 8/29/06, Xiaocheng Luan wrote: > > > > > > Hi, > > > Got a question. Here is what I want to achieve: > > > > > > Create a new index from an existing index, to change the boosting > > factor > > > for some of the documents (and potentially some other tweaks), without > > > > > reindexing it from the source. > > > > > > Is there any tools or ways to do this? > > > Thanks! > > > Xiaocheng Luan > > > > > > > > > --------------------------------- > > > Get your own web address for just $1.99/1st yr. We'll help. Yahoo! > > Small > > > Business. > > > > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > > > --------------------------------- Stay in the know. Pulse on the new Yahoo.com. Check it out.