Given a non-tokenized field that has DocValues, the primary (maybe even
only?) reason for making it stored, seems to be document retrieval. When
the goal is to construct documents, the base difference between just
returning the stored values and returning both stored and DocValued
values seems to be performance: Resolving a non-trivial amount of stored
values for each document is mostly a bulk operation, while the DocValued
ones is more random access.

In most of our setups, search-results are divided between overviews
(classic top-10 or top-20 with most relevant documents) and expanded
views (separate page or a result box that changes size). The overviews
have few data and the expanded views have more data. The data for
overviews needs to be provided quickly (stored), whereas the expanded
views are one-document-at-a-time and thus does not have the same time
requirements (DocValue speed is fine).

As non-trivial space (15% in an index I am investigating) can be saved
by doing DocValue without storing, would it be an idea to provide
support for retrieving DocValued fields as part of document retrieval?

This could be done in different ways:

* Only return stored values with fl=*. If a field is referenced 
  explicitly with fl=myfield and is DocValued but not stored, return
  the DocValued value.

* State that DocValued fields, that are not stored, should be returned 
  with a flag: resolvedv=true 


- Toke Eskildsen, State and University Library, Denmark



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to