[ 
https://issues.apache.org/jira/browse/LUCENE-2789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966076#action_12966076
 ] 

Uwe Schindler commented on LUCENE-2789:
---------------------------------------

Can we move the CFS code outside and the codec simply calls a 
class/component/whatever during merging and say: I have these files and want to 
create a CFS out of it? For reading something similar.

> Let codec decide to use compound file system or not
> ---------------------------------------------------
>
>                 Key: LUCENE-2789
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2789
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Codecs, Index
>            Reporter: Simon Willnauer
>
> While working on LUCENE-2186  and in the context of recent [mails | 
> http://www.lucidimagination.com/search/document/e75cfa6050d5176/consolidate_mp_and_lmp#97c69a198952ebaa]
>  about consolidating MergePolicy and LogMergePolicy I wanna propose a rather 
> big change how Compund Files are created / handled in IW. Since Codecs have 
> been introduced we have several somewhat different way of how data is written 
> to the index. Sep codec for instance writes different files for index data 
> and DocValues will write one file per field and segment. Eventually codecs 
> need to have more control over how files are written ie. if CFS should be 
> used or not is IMO really  a matter of the codec used for writing.
> On the other hand when you look at IW internals CFS really pollutes the 
> indexing code and relies on information from inside a codec (see 
> SegmentWriteState.flusedFiles) actuall this differentiation spreads across 
> many classes related to indexing including the LogMergePolicy. IMO how new 
> flushed segments are written has nothing to do with MP in the first place and 
> MP currently choses whether a newly flushed segment is CFS or not (correct me 
> if I am wrong), pushing all this logic down to codecs would make lots of code 
> much easier and cleaner.
> As mike said this would also reduce the API footprint if we make it private 
> to the codec. I can imagine some situations where you really want control 
> over certain fields to be stored as non-CFS and other to be stored as CFS.  
> Codecs might need more information about other segments during a merge to 
> decide if or not to use CFS based on the segments size but we can easily 
> change that API. From a reading point of view we already have Codec#files 
> that can decide case by case what files belong to this codec.
> let me know the thoughts

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to