On 19/07/2012 14:26, Simon McDuff wrote:

I'm using Lucene 4.0.

I'm inserting around 300 000 documents / seconds.

We do not have any store fields. But we noticed that .fdt get populated even so.

.fdx contains useless informations.
.fdt contains only zero....useless...

Is there a way to minimize the impact ?

This happens because the Lucene40StoredFieldsWriter (part of the Lucene40 Codec) uses a simplistic layout for the data - for every document it writes a long to the .fdx file (8 bytes) to mark the position of the fields' data, and a vint to the .fdt file (at least one byte) to record the number of fields, and then the actual stored fields' data.

We could modify this format to be less verbose for documents without stored fields, e.g. use block-delta encoding of the .fdx file and avoid writing anything to the .fdt file if there are no stored fields. The question is whether the space savings would be worth the complication?

--
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
 ___.,___,___,___,_._. __________________<><____________________
[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to