[
https://issues.apache.org/jira/browse/LUCENE-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485163#comment-13485163
]
Robert Muir commented on LUCENE-4509:
-------------------------------------
I think its ok too. I just didnt know if we could do something trivial like
store the offsets-within-the-blocks as packed ints,
so that it optimizes for this case anyway (offset=0) and only takes a
8bytes+1bit instead of 12 bytes.
But i don't have a real understanding of what this thing does when docsize >
blocksize, i havent dug in that much.
in any case I think it should be the default: its fast and works also for tiny
documents with lots of fields.
I think people expect the index to be compressed in some way and the stored
fields are really wasteful today.
> Make CompressingStoredFieldsFormat the new default StoredFieldsFormat impl
> --------------------------------------------------------------------------
>
> Key: LUCENE-4509
> URL: https://issues.apache.org/jira/browse/LUCENE-4509
> Project: Lucene - Core
> Issue Type: Wish
> Components: core/store
> Reporter: Adrien Grand
> Priority: Minor
>
> What would you think of making CompressingStoredFieldsFormat the new default
> StoredFieldsFormat?
> Stored fields compression has many benefitsĀ :
> - it makes the I/O cache work for us,
> - file-based index replication/backup becomes cheaper.
> Things to know:
> - even with incompressible data, there is less than 0.5% overhead with LZ4,
> - LZ4 compression requires ~ 16kB of memory and LZ4 HC compression requires
> ~ 256kB,
> - LZ4 uncompression has almost no memory overhead,
> - on my low-end laptop, the LZ4 impl in Lucene uncompresses at ~ 300mB/s.
> I think we could use the same default parameters as in CompressingCodec :
> - LZ4 compression,
> - in-memory stored fields index that is very memory-efficient (less than 12
> bytes per block of compressed docs) and uses binary search to locate
> documents in the fields data file,
> - 16 kB blocks (small enough so that there is no major slow down when the
> whole index would fit into the I/O cache anyway, and large enough to provide
> interesting compression ratiosĀ ; for example Robert got a 0.35 compression
> ratio with the geonames.org database).
> Any concerns?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]