It has never worked, though I do think the metadata has changed over time, so the degree to which it didn't work has changed?
Mike McCandless http://blog.mikemccandless.com On Mon, Aug 22, 2016 at 4:41 PM, Stuart Goldberg <sgoldb...@fixflyer.com> wrote: > Understood, but did it used to work? > > > > Stuart M Goldberg > > Senior Vice President of Software Develpment > *FIX Flyer LLC* > http://www.FIXFlyer.com/ > > NOTICE TO RECIPIENT: THIS E- MAIL IS MEANT ONLY FOR THE INTENDED > RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION > WHICH IS PROPRIETARY TO FIX FLYER LLC ANY UNAUTHORIZED USE, COPYING, > DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED. ALL RIGHTS TO THIS > INFORMATION IS RESERVED BY FIX FLYER LLC. IF YOU ARE NOT THE INTENDED > RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND PLEASE DELETE THIS > E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES. > > > > *From:* Michael McCandless [mailto:luc...@mikemccandless.com] > *Sent:* Monday, August 22, 2016 4:38 PM > *To:* Stuart Goldberg <sgoldb...@fixflyer.com> > *Cc:* Lucene Users <java-user@lucene.apache.org> > > *Subject:* Re: Problems Refactoring a Lucene Index > > > > The design is indeed trappy, and many users have hit the situation you > have, and we have tried to fix this before (to change IndexReader.document > to return a different class than Document), but it didn't "take": > https://issues.apache.org/jira/browse/LUCENE-6971 > > > > Have a look at FieldInfo.java to see the metadata it records. > > > > The challenge here is Lucene's schema-less-ness. For example, on a > document by document basis, you can change how term vectors are indexed, > whether a field is stored, or omits norms, or indexes only docs and not > freqs, etc., for the same field across documents, across segments. > > > > Lucene only stores in FieldInfo what is necessary for it to read the index > files, and does not store metadata beyond that. > > > > Patches welcome :) We should fix this trap; it's just that doing so is > apparently not so easy. > > > Mike McCandless > > http://blog.mikemccandless.com > > > > On Mon, Aug 22, 2016 at 11:04 AM, Stuart Goldberg <sgoldb...@fixflyer.com> > wrote: > > Thanks for the quick response. > > > > I kind of figured on my own that I had to recreate the document from > scratch > > > > But there is something in your response that I don’t understand. You say > “Lucene > only preserves the metadata it needs for each field”. What does that mean? > In my posting I gave examples of metadata returned that is clearly the > exact opposite of the metadata that was there when originally indexed. > > > > According to what you are saying there is metadata that is preserved > correctly. What metadata is that? > > > > Not sure if you are just a Lucene guru (I have your Lucene in Action > books!) or an actual author/contributor to the code, so my observation > might not be appropriately directed at you. But it seems a questionable API > design to return a “Document” from the index that has properties described > by the Javadoc that give back bogus data. > > > > And what about the FieldInfo class that purports to give back field > information. Why have such an API if the data it provides is bogus? > > > > Stuart M Goldberg > > Senior Vice President of Software Develpment > *FIX Flyer LLC* > http://www.FIXFlyer.com/ > > NOTICE TO RECIPIENT: THIS E- MAIL IS MEANT ONLY FOR THE INTENDED > RECIPIENT(S) OF THE TRANSMISSION, AND CONTAINS CONFIDENTIAL INFORMATION > WHICH IS PROPRIETARY TO FIX FLYER LLC ANY UNAUTHORIZED USE, COPYING, > DISTRIBUTION, OR DISSEMINATION IS STRICTLY PROHIBITED. ALL RIGHTS TO THIS > INFORMATION IS RESERVED BY FIX FLYER LLC. IF YOU ARE NOT THE INTENDED > RECIPIENT, PLEASE CONTACT THE SENDER BY REPLY EMAIL AND PLEASE DELETE THIS > E-MAIL FROM YOUR SYSTEM AND DESTROY ANY COPIES. > > > > *From:* Michael McCandless [mailto:luc...@mikemccandless.com] > *Sent:* Monday, August 22, 2016 10:48 AM > *To:* Lucene Users <java-user@lucene.apache.org>; sgoldb...@fixflyer.com > *Subject:* Re: Problems Refactoring a Lucene Index > > > > This is unfortunately "by design": Lucene makes no guarantees that the > Document you retrieve from an IndexReader is precisely the same Document > you had indexed. > > > > Lucene only preserves the metadata it needs for each field. > > > > Your only recourse is to create a new Document using your application > level information about which fields are tokenized, indexed, etc. > > > Mike McCandless > > http://blog.mikemccandless.com > > > > On Fri, Jul 8, 2016 at 12:12 PM, Stuart Goldberg <sgoldb...@fixflyer.com> > wrote: > > As our software goes through its lifecycle, we sometimes have to alter > existing Lucene indexes. The way I have done that in the past is to open > the > existing index for reading, read each Document, modify it and write that > Document to a new index. At the end of the process, I delete the old index > and rename the new index to the old name. > > I do not do any tokenizing and use no analyzers. > > I recently upgraded from Lucene 3.x to 4.10.4. Now I have the following > problem: Suppose the existing document has 10 fields in it and there's one > I > have to modify. I remove that field and re-add it with the new settings. > Then I add the Document in its entirety to the new index. I run into the > following problems: > > * I get Exceptions thrown for the fields I don't even touch. That's > because their FieldType has 'tokenized' set to true and it fails because I > am using no analyzers. 'tokenized' is set to true even though when I > originally added the field to the original index I had 'tokenized' set to > false! > > * I have LongFields that come back with 'indexed' set to false even > though in the original index they were indexed! This makes the new index > not > searchable on these fields and hence unusable. > > * I can't even alter 'indexed' for these LongFields because for some > reason the FieldType instance comes back frozen from the IndexReader. Once > frozen, you can't alter it. Even if I create a new FieldType, there is no > way to change the FieldType of a Field > > It seems the returned FieldType contents are kind of random! > > I did see in the Javadoc of IndexReader.document() that field metadata is > not returned and that, in fact, that they should have new kind of object > returned like 'StoredField' so there is no pretense of there being any > metadata. > > I thought perhaps I could use FieldInfos. But that class returns the same > bogus metadata. What then is the purpose of FieldInfos if the info is > bogus? > > Am I not understanding something here? This is not very usable. What can I > do to work around this? Is this a Lucene bug? Oversight? > > > > >