[
https://issues.apache.org/jira/browse/LUCENE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shai Erera updated LUCENE-5618:
-------------------------------
Attachment: LUCENE-5618.patch
Patch addresses the following:
* Modifies Lucene45/42DocValuesProducer to assert that all encoded fields exist
in the FieldInfos.
* Simplifies ReaderAndUpdates.writeFieldUpdates readability by breaking out the
updates to separate methods.
* Each DocValues field's updates are written to separate files.
* Adds SegmentCommitInfo.docValuesGen, separate from fieldInfosGen.
* Fixes LUCENE-5636 by tracking per-field updates files, as well as fieldInfos
files.
** per-generation update files are kept as deprecated, needed for 4.6-4.8
indexes back-compat. These become empty after the segment is merged.
* Improved {{testDeleteUnusedUpdatesFiles}} to cover two fields' updates (this
exposes the bug on LUCENE-5636).
In terms of backwards compatibility, indexes between 4.6-4.8 will continue to
reference unneeded files until the segment is merged. This is impossible to fix
without breaking back-compat or introduce weird hacks which assume the default
codec. This is not terrible though, since the number of unneeded-but-referenced
files is limited by the number of DV fields the app has updated.
I'd appreciate a review on this. Before I commit it though, I want to take care
of LUCENE-5619, so we're sure the back-compat logic in this patch indeed works.
> DocValues updates send wrong fieldinfos to codec producers
> ----------------------------------------------------------
>
> Key: LUCENE-5618
> URL: https://issues.apache.org/jira/browse/LUCENE-5618
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Robert Muir
> Priority: Blocker
> Fix For: 4.9
>
> Attachments: LUCENE-5618.patch
>
>
> Spinoff from LUCENE-5616.
> See the example there, docvalues readers get a fieldinfos, but it doesn't
> contain the correct ones, so they have invalid field numbers at read time.
> This should really be fixed. Maybe a simple solution is to not write
> "batches" of fields in updates but just have only one field per gen?
> This removes many-many relationships and would make things easy to understand.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]