[
https://issues.apache.org/jira/browse/LUCENE-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864934#action_12864934
]
Shai Erera commented on LUCENE-1585:
------------------------------------
I went over the tests and realized I didn't write one which adds indexes into
an already populated index. Ideally, the payloads in the existing index should
not be re-processed b/c of the external ones that are added. But this doesn't
happen, as addIndexes and addIndexesNoOpt don't distinguish well between local
and external segments. It all boils down to IW,merge() which calls SM.merge()
...
Then I figured a single PayloadConsumer "might not fit all" - e.g. there are
cases where different PCs are needed for different indexes. The app can call
addIndexes one at a time, but that's not efficient. So I think the entry-level
API should be a PayloadConsumerProvider, which declares one
getPayloadConsumer(Directory) method. It returns a PC corresponding to a
Directory. It gives the app the freedom it needs to:
* Always return the same PC for all Dirs.
* Return different PCs for different Dirs.
* Return null for some Dirs, so that their payloads are not re-processed.
Setting out to impl that, I've noticed addIndexes and addIndexesNoOpt behave
differently. While addIndexes interacts w/ the SegmentMerger directly (and
hence can easily pass it the PCP), NoOpt reads the SIs from the given Dirs,
call maybeMerge(), which triggers SM.merge(), to merge local + external
segments. We cannot pass PCP to maybeMerge since that won't help - the call
chain hits MergeScheduler, which loops-back at us when it calls IW.merge() ..
seems way too complicated.
Additionally, there is no way to guarantee that PCP won't be invoked during
addIndexesNoOpt on local segments (unless it does not provide a PC for the
target Dir) ...
Therefore, I'd like to add PCP to IWC, for the following reasons:
* As I said above, there's no way to guarantee it won't be invoked on local
segments when *NoOpt is called.
* There's no clean way to ensure NoOpt passes it on to SM, w/o passing PCP
through MergeScheduler.
* It might be useful for apps that want to rewrite their payloads only over
time -- sort of a mini app-level migration tool (of just payloads).
* It cleans the API - does not affect 'backwards', no need to pass it on
through several methods until it gets to SM -- simplifies the solution.
This is an expert API. Therefore, apps that set it probably know what they're
doing. Therefore I believe they will be able to understand how to not invoke
their PCs on the target dir's segments.
What do you think?
> Allow to control how payloads are merged
> ----------------------------------------
>
> Key: LUCENE-1585
> URL: https://issues.apache.org/jira/browse/LUCENE-1585
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Index
> Reporter: Michael Busch
> Assignee: Shai Erera
> Priority: Minor
> Fix For: 3.1, 4.0
>
> Attachments: LUCENE-1585_3x.patch, LUCENE-1585_trunk.patch
>
>
> Lucene handles backwards-compatibility of its data structures by
> converting them from the old into the new formats during segment
> merging.
> Payloads are simply byte arrays in which users can store arbitrary
> data. Applications that use payloads might want to convert the format
> of their payloads in a similar fashion. Otherwise it's not easily
> possible to ever change the encoding of a payload without reindexing.
> So I propose to introduce a PayloadMerger class that the SegmentMerger
> invokes to merge the payloads from multiple segments. Users can then
> implement their own PayloadMerger to convert payloads from an old into
> a new format.
> In the future we need this kind of flexibility also for column-stride
> fields (LUCENE-1231) and flexible indexing codecs.
> In addition to that it would be nice if users could store version
> information in the segments file. E.g. they could store "in segment _2
> the term a:b uses payloads of format x.y".
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]