[
https://issues.apache.org/jira/browse/LUCENE-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966076#action_12966076
]
Uwe Schindler commented on LUCENE-2789:
---------------------------------------
Can we move the CFS code outside and the codec simply calls a
class/component/whatever during merging and say: I have these files and want to
create a CFS out of it? For reading something similar.
> Let codec decide to use compound file system or not
> ---------------------------------------------------
>
> Key: LUCENE-2789
> URL: https://issues.apache.org/jira/browse/LUCENE-2789
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Codecs, Index
> Reporter: Simon Willnauer
>
> While working on LUCENE-2186 and in the context of recent [mails |
> http://www.lucidimagination.com/search/document/e75cfa6050d5176/consolidate_mp_and_lmp#97c69a198952ebaa]
> about consolidating MergePolicy and LogMergePolicy I wanna propose a rather
> big change how Compund Files are created / handled in IW. Since Codecs have
> been introduced we have several somewhat different way of how data is written
> to the index. Sep codec for instance writes different files for index data
> and DocValues will write one file per field and segment. Eventually codecs
> need to have more control over how files are written ie. if CFS should be
> used or not is IMO really a matter of the codec used for writing.
> On the other hand when you look at IW internals CFS really pollutes the
> indexing code and relies on information from inside a codec (see
> SegmentWriteState.flusedFiles) actuall this differentiation spreads across
> many classes related to indexing including the LogMergePolicy. IMO how new
> flushed segments are written has nothing to do with MP in the first place and
> MP currently choses whether a newly flushed segment is CFS or not (correct me
> if I am wrong), pushing all this logic down to codecs would make lots of code
> much easier and cleaner.
> As mike said this would also reduce the API footprint if we make it private
> to the codec. I can imagine some situations where you really want control
> over certain fields to be stored as non-CFS and other to be stored as CFS.
> Codecs might need more information about other segments during a merge to
> decide if or not to use CFS based on the segments size but we can easily
> change that API. From a reading point of view we already have Codec#files
> that can decide case by case what files belong to this codec.
> let me know the thoughts
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]