[ 
https://issues.apache.org/jira/browse/LUCENE-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485182#comment-13485182
 ] 

Robert Muir commented on LUCENE-4509:
-------------------------------------

I'd say to make progress for the default we want to look at:
* make a concrete impl of CompressingStoredFieldsFormat called Lucene41, 
hardwired to the defaults and add file format docs?
  This way, we don't have to support all of the Compression options/layouts in 
the default codec (if someone wants that, 
  encourage them to make their own codec with the Compressed settings they 
like). Back compat is much 
  less costly as the parameters are fixed. File format docs are easier :)
* should we s/uncompression/decompression/ across the board?
* tests already look pretty good. I can try to work on some additional ones to 
try to break it like we did with BlockPF.
* there is some scary stuff (literal decompressions etc) uncovered by the 
clover report: 
https://builds.apache.org/job/Lucene-Solr-Clover-4.x/49/clover-report/org/apache/lucene/codecs/compressing/CompressionMode.html
 We should make sure any special cases are tested.

                
> Make CompressingStoredFieldsFormat the new default StoredFieldsFormat impl
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-4509
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4509
>             Project: Lucene - Core
>          Issue Type: Wish
>          Components: core/store
>            Reporter: Adrien Grand
>            Priority: Minor
>
> What would you think of making CompressingStoredFieldsFormat the new default 
> StoredFieldsFormat?
> Stored fields compression has many benefitsĀ :
>  - it makes the I/O cache work for us,
>  - file-based index replication/backup becomes cheaper.
> Things to know:
>  - even with incompressible data, there is less than 0.5% overhead with LZ4,
>  - LZ4 compression requires ~ 16kB of memory and LZ4 HC compression requires 
> ~ 256kB,
>  - LZ4 uncompression has almost no memory overhead,
>  - on my low-end laptop, the LZ4 impl in Lucene uncompresses at ~ 300mB/s.
> I think we could use the same default parameters as in CompressingCodec :
>  - LZ4 compression,
>  - in-memory stored fields index that is very memory-efficient (less than 12 
> bytes per block of compressed docs) and uses binary search to locate 
> documents in the fields data file,
>  - 16 kB blocks (small enough so that there is no major slow down when the 
> whole index would fit into the I/O cache anyway, and large enough to provide 
> interesting compression ratiosĀ ; for example Robert got a 0.35 compression 
> ratio with the geonames.org database).
> Any concerns?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to