This just in from the thread "*Re-created fields consistently indexed"

*Erik Hatcher replied as below, and believe me, Erik knows waaaaaay more
about this than I do .....


On Aug 30, 2006, at 11:07 AM, Jason Polites wrote:
I understand that it is possible to "re-create" fields which are
indexed but
not stored (as is done by Luke), and that this is a lossy process,
however I
am wondering whether the indexed version of this remains consistent.

That is, if I re-create a non-stored field, then re-index this
field in a
new document, will the indexed (inverted) data be the same as in the
original version?

That really all depends.  For example, the original analysis process
could put in position increment gaps between sentences or overlay
terms in same positions, and don't forget about term offsets.   Just
pulling out the terms without getting those details would be even
lossier.  I'm not even sure how you could recreate that even if you
had all of that position and offset information without some kind of
special Analyzer that would set things exactly as they were.  I
suspect there is some low-level tricks that could be played to do
this bypassing an analyzer.

      Erik*
*
On 8/30/06, Erick Erickson <[EMAIL PROTECTED]> wrote:

Well, assuming you can get all the information you need out of your index,
you really only have two choices that I see.
1> iterate through your documents and delete and re-add each document to
that same index.
2> iterate through your documents and add the doc to a *new* index, then
replace your old index with the new one.

It should be straight forward to do either, you just have to do something
like IndexReader.document(#) where # is the internal Lucene doc ID. You'd
have to know ahead of time what the largest existing doc ID is, but that's
not hard....

But not this from the javadoc

Note that fields which are *not* stored are *not* available in documents
retrieved from the index, e.g. with Hits.doc(int), Searcher.doc(int) or
IndexReader.document(int).

Given the size of your indexes, I don't think that there's much to be
gained by trying to delete from/add to the same index, I'd just create a new
one.

If you can't get everything you need from the index, I'm afraid you're out
of luck. I might suggest that if you *do* have to go through the pain of
hitting your remote resources, you store the raw data locally in case you
need to do this again in the future. I know that's not much help, but....

Or, figure out how to make Lucene update-in-place, write the code, test it
and submit a patch. I'm sure Erik, Otis et.al. would offer you profuse
thanks <G>

good luck
Erick


On 8/29/06, Xiaocheng Luan <[EMAIL PROTECTED]> wrote:
>
> Thanks, Erick.
> I agree that it might be unlikely to reconstruct from an existing index,
> but I think document boosting (that is, one document has a higher boost
> factor than other documents) as well as field boosting is specified during
> indexing.
>
> Our use case is performancce/results tuning. We have huge indexes (in
> the range of dozens of GBs) and some sources are remote. I'm trying to
> figure out ways to avoid re-ingesting the contents as much as possible.
> Any suggestions?
> Thanks.
> X.
>
> Erick Erickson <[EMAIL PROTECTED]> wrote: A couple of things..
>
> 1> I don't think you set the boost when indexing. You set the boost when
>
> querying, so you don't need to re-index for boosting.
>
> 2> A recurring theme is that you can't do an update-in-place for a
> lucene
> document. You might search the mail archive for a discussion of this.
> The
> short form is that if you want to change every document, you're probably
> better off re-indexing the whole thing. If, for some reason you
> can't/don't
> want to just re-index it all, then be aware that if you didn't store the
>
> fields for the documents (i.e. use Field.Store.YES), then you really
> can't
> reconstruct the document from the index without potentially losing
> information.
>
> Hope this helps
> Erick
>
> On 8/29/06, Xiaocheng Luan  wrote:
> >
> > Hi,
> > Got a question. Here is what I want to achieve:
> >
> > Create a new index from an existing index, to change the boosting
> factor
> > for some of the documents (and potentially some other tweaks), without
>
> > reindexing it from the source.
> >
> > Is there any tools or ways to do this?
> > Thanks!
> > Xiaocheng Luan
> >
> >
> > ---------------------------------
> > Get your own web address for just $1.99/1st yr. We'll help. Yahoo!
> Small
> > Business.
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>


Reply via email to