Let codec decide to use compound file system or not
---------------------------------------------------

                 Key: LUCENE-2789
                 URL: https://issues.apache.org/jira/browse/LUCENE-2789
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Codecs, Index
            Reporter: Simon Willnauer


While working on LUCENE-2186  and in the context of recent [mails | 
http://www.lucidimagination.com/search/document/e75cfa6050d5176/consolidate_mp_and_lmp#97c69a198952ebaa]
 about consolidating MergePolicy and LogMergePolicy I wanna propose a rather 
big change how Compund Files are created / handled in IW. Since Codecs have 
been introduced we have several somewhat different way of how data is written 
to the index. Sep codec for instance writes different files for index data and 
DocValues will write one file per field and segment. Eventually codecs need to 
have more control over how files are written ie. if CFS should be used or not 
is IMO really  a matter of the codec used for writing.

On the other hand when you look at IW internals CFS really pollutes the 
indexing code and relies on information from inside a codec (see 
SegmentWriteState.flusedFiles) actuall this differentiation spreads across many 
classes related to indexing including the LogMergePolicy. IMO how new flushed 
segments are written has nothing to do with MP in the first place and MP 
currently choses whether a newly flushed segment is CFS or not (correct me if I 
am wrong), pushing all this logic down to codecs would make lots of code much 
easier and cleaner.

As mike said this would also reduce the API footprint if we make it private to 
the codec. I can imagine some situations where you really want control over 
certain fields to be stored as non-CFS and other to be stored as CFS.  Codecs 
might need more information about other segments during a merge to decide if or 
not to use CFS based on the segments size but we can easily change that API. 
From a reading point of view we already have Codec#files that can decide case 
by case what files belong to this codec.

let me know the thoughts


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to