[ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977575#comment-14977575
 ] 

Yonik Seeley commented on SOLR-8220:
------------------------------------

bq. reading values from docValues when a stored version of the field does not 
exist would be a valuable disk usage optimization.

+1, and I've heard a number of users request this.

bq. 1) There doesn't seem to be a standard way to read values for docValues, 
facets, analytics, streaming, etc, all seem to be doing their own ways, perhaps 
some of this logic should be centralized.

See ReturnFields / ResultContext, that's currently where stored field handling 
is centralized, and handles anywhere a field list (or pseudo-fields / 
transformers) is specified.

+1 for 2a as a first pass.
For bonus points, prevent stored fields from being loaded at all when not 
needed.  This gets us a big step closer to having the normal request handler 
have the same performance as "/export".

Looking beyond the first pass, it might be nice to use docValues as more of a 
first-class alternate "stored" mechanism, and consider them part of "*".  If 
for some reason it's desirable to treat some docValues fields as stored, and 
others not, we could introduce a flag on <field> in the schema.

bq. The only caveat with this that I can see would be for multiValued fields as 
they would always be returned sorted in the docValues approach. I believe this 
is a fair compromise.

This shouldn't be much of a concern for approach 2a, but another future option 
would be to add explicit set types, and also implement list-type multi-valued 
docValues fields... prob using binary docValues under the covers).


> Read field from docValues for non stored fields
> -----------------------------------------------
>
>                 Key: SOLR-8220
>                 URL: https://issues.apache.org/jira/browse/SOLR-8220
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Keith Laban
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>    -- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to