[ 
https://issues.apache.org/jira/browse/LUCENE-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860891#action_12860891
 ] 

Shai Erera commented on LUCENE-1585:
------------------------------------

Michael, I would like to take a stab at it if you don't mind (unless you are 
working on it). In fact, I've investigated and was about to open an issue 
before I came across this one :).

W/ Flex, one can set a Codec, however that Codec will be used for regular 
segment merges. The problem (that I've run into and you describe here) is that 
during addIndexes*, one would need a different Codec for payloads, in order to 
rewrite the ones that come from the external indexes. I was thinking to add 
another variation of addIndexes* which take a PayloadConsumer as input, and in 
SM.appendPostings, after it reads the payload from the other index, invoke 
PayloadConsumer on that payload, and only after that write it to 
PositionConsumer.

That API would of course be marked as experimental.

Also, SM.appendPostings is called from two different code paths - addIndexes* 
and regular segment merges. For the regular merges, the PC should be null, but 
for the addIndexes it may not be. So we'll need to add that API to a bunch of 
classes in the call chain, but all of them are either private methods or 
package-private classes.

How's that sound? I can cons up a patch if that sounds reasonable.

> Allow to control how payloads are merged
> ----------------------------------------
>
>                 Key: LUCENE-1585
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1585
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: 3.1
>
>
> Lucene handles backwards-compatibility of its data structures by
> converting them from the old into the new formats during segment
> merging. 
> Payloads are simply byte arrays in which users can store arbitrary
> data. Applications that use payloads might want to convert the format
> of their payloads in a similar fashion. Otherwise it's not easily
> possible to ever change the encoding of a payload without reindexing.
> So I propose to introduce a PayloadMerger class that the SegmentMerger
> invokes to merge the payloads from multiple segments. Users can then
> implement their own PayloadMerger to convert payloads from an old into
> a new format.
> In the future we need this kind of flexibility also for column-stride
> fields (LUCENE-1231) and flexible indexing codecs.
> In addition to that it would be nice if users could store version
> information in the segments file. E.g. they could store "in segment _2
> the term a:b uses payloads of format x.y".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to