Adrien Grand created LUCENE-4509:
------------------------------------

             Summary: Make CompressingStoredFieldsFormat the new default 
StoredFieldsFormat impl
                 Key: LUCENE-4509
                 URL: https://issues.apache.org/jira/browse/LUCENE-4509
             Project: Lucene - Core
          Issue Type: Wish
          Components: core/store
            Reporter: Adrien Grand
            Priority: Minor


What would you think of making CompressingStoredFieldsFormat the new default 
StoredFieldsFormat?

Stored fields compression has many benefitsĀ :
 - it makes the I/O cache work for us,
 - file-based index replication/backup becomes cheaper.

Things to know:
 - even with incompressible data, there is less than 0.5% overhead with LZ4,
 - LZ4 compression requires ~ 16kB of memory and LZ4 HC compression requires ~ 
256kB,
 - LZ4 uncompression has almost no memory overhead,
 - on my low-end laptop, the LZ4 impl in Lucene uncompresses at ~ 300mB/s.

I think we could use the same default parameters as in CompressingCodec :
 - LZ4 compression,
 - in-memory stored fields index that is very memory-efficient (less than 12 
bytes per block of compressed docs) and uses binary search to locate documents 
in the fields data file,
 - 16 kB blocks (small enough so that there is no major slow down when the 
whole index would fit into the I/O cache anyway, and large enough to provide 
interesting compression ratiosĀ ; for example Robert got a 0.35 compression 
ratio with the geonames.org database).

Any concerns?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to