date:20190701

find documents with big stored fields

2019-07-01 Thread Rob Audenaerde

Hello, We are currently trying to investigate an issue where in the index-size is disproportionally large for the number of documents. We see that the .fdt file is more than 10 times the regular size. Reading the docs, I found that this file contains the fielddata. I would like to find the docum

Re: find documents with big stored fields

2019-07-01 Thread Michael McCandless

Hi Rob, The codec records per docid how many bytes each document consumes -- maybe instrument the codec's sources locally, then open your index and have it visit stored fields for every doc in the index and gather stats? Or, to avoid touching Lucene level code, you could make a small tool that lo

Re: find documents with big stored fields

2019-07-01 Thread Erick Erickson

Whoa. First, it should be pretty easy to figure out what fields are large, just look at your input documents. The fdt files are really simple, they’re just the compressed raw data. Numeric fields, for instance, are just character data in the fdt files. We usually see about a 2:1 ratio. There’s

find documents with big stored fields

Re: find documents with big stored fields

Re: find documents with big stored fields

3 matches

Site Navigation

Mail list logo

Footer information