Re: Lucene 4.0 .FDT

Andrzej Bialecki Thu, 19 Jul 2012 07:45:01 -0700

On 19/07/2012 14:26, Simon McDuff wrote:


I'm using Lucene 4.0.

I'm inserting around 300 000 documents / seconds.

We do not have any store fields. But we noticed that .fdt get populated even so.

.fdx contains useless informations.
.fdt contains only zero....useless...

Is there a way to minimize the impact ?

This happens because the Lucene40StoredFieldsWriter (part of theLucene40 Codec) uses a simplistic layout for the data - for everydocument it writes a long to the .fdx file (8 bytes) to mark theposition of the fields' data, and a vint to the .fdt file (at least onebyte) to record the number of fields, and then the actual stored fields'data.

We could modify this format to be less verbose for documents withoutstored fields, e.g. use block-delta encoding of the .fdx file and avoidwriting anything to the .fdt file if there are no stored fields. Thequestion is whether the space savings would be worth the complication?


--
Best regards,
Andrzej Bialecki
http://www.sigram.com, blog http://www.sigram.com/blog
 ___.,___,___,___,_._. __________________<><____________________
[___||.__|__/|__||\/|: Information Retrieval, System Integration
___|||__||..\|..||..|: Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Lucene 4.0 .FDT

Reply via email to