Thanks for the explanation, Adrien. I do have a couple of follow-up
questions. Isn't this block size used for file caching OS-dependent? And if
4K happens to be the most commonly used size, wouldn't it make more sense
for the default stored fields format to have a chunk size equal to or
smaller tha
Hi Vitaly,
Doc values are indeed well-suited for grouping and sorting. However
stored fields remain better at returning field values to users since
they guarantee a worst-case of one disk seek per document.
The filesystem cache typically caches data by blocks of 4KB. This
plays more nicely with d
I use stored fields to load values for the following use cases:
- to return per-document values as is, requested by the user - similar to
listing DB columns you are interested in, in a "select ..." clause.
- to perform aggregate function calculations while forming the result set
(if requested).
- f
Hi,
What are you doing with the stored fields? They are not deprecated and also not
really slow, unless you scan over millions of documents in random access order.
To display serach results, DocValues are of no use.
Uwe
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetap
Hi,
> The use-case is that some of the fields in the document are made up of
> term:frequency pairs. What I am doing right now is to expand these with a
> TokenFilter, so that for e.g. "dog:3 cat:2", I return "dog dog dog cat cat",
> and
> index that. However, the problem is that when these field
Hi David,
I'm not an expert, but I've climbed through the consumers myself in the
past. The big limit is that the full postings for a document or document
block must fit into memory. There may be other hidden processing limits
(ie. memory used per-field).
I think it would be possible to create
I have heard here that stored fields don't work well with OS file caching.
Could someone elaborate on why that is? I am using Lucene 4.6 and we do use
stored fields but not doc values; it appears most of the benefit from the
latter comes as improvement in sorting performance, and I don't actually
u
Hi guys,
I have just recently (re-)joined the list. I have an issue with indexing; I hope
someone can help me with it.
The use-case is that some of the fields in the document are made up of
term:frequency pairs. What I am doing right now is to expand these with a
TokenFilter, so that for e.g. "do