[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields

Ishan Chattopadhyaya (JIRA) Fri, 18 Dec 2015 01:02:21 -0800

     [ 
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ishan Chattopadhyaya updated SOLR-8220:
---------------------------------------
    Attachment: SOLR-8220.patch

Thanks for your review, Shalin. I've updated the patch to address your 
suggestions.

bq.     The SolrIndexSearcher.decorateDocValueFields method has a 
honourUseDVsAsStoredFlag which is always true. We can remove it?
bq.    Same for SolrIndexSearcher.getNonStoredDocValuesFieldNames?

Refactored the decorateDocValues() a bit to not send in wantsAllFields flag to 
the method and to handle it at the DocsStreamer itself. Hence, now, the 
decorateDocValues() method takes in only the field names it needs to do 
anything about; the filtering for non-stored dvs is taken care of at 
DocsStreamer.next() itself. 

Since, for the {{fl=\*}} case, we need all non-stored DVs that have 
{{useDocValuesAsStored}}=true, but for the general filtering case of 
{{fl=dv1,dv2}} we need to filter using all non-stored DVs (irrespective of the 
useDocValuesAsStored flag), I've retained this true/false logic in the 
getNonStoredDocValuesFieldNames() method. Renamed that method, however, to call 
it {{getNonStoredDVs(boolean onlyUseDocValuesAsStored)}} and added a clear 
javadoc to this effect.


bq.    The wantsAllFields flag added to SolrIndexSearcher.doc doesn't seem 
necessary. I guess it was added because the patch adds non stored doc values 
fields to the 'fnames' but if we can separate out stored fnames from the 
non-stored doc values to be returned then we can remove this param from both 
SolrIndexSearcher.doc and SolrIndexSearcher.getNonStoredDocValuesFieldNames
I think the original motivation was to deal with cases {{fl=\*,nonstoredDv1}}. 
Here, the idea initially was that {{\*}} returns all stored fields, and 
nonstoredDv1 is added to it. But now, since {{\*}} takes care of all stored and 
non-stored dvs, this logic isn't needed. So, this wantsAllFields flag was a 
left over from a previous patch which I've now removed.

bq.    The pattern matching in the DocStreamer constructor makes a bit nervous. 
Where is the pattern matching done for current stored fields?
Keith can weigh in on this better. However, I had a look, and found that 
responseWriters (e.g. JSONResponseWriter) get the whole SolrDocument at the 
{{writeSolrDocument()}} method, from where it does the following call to drop 
fields it doesn't need:
{code}
    for (String fname : doc.getFieldNames()) {
      if (returnFields!= null && !returnFields.wantsField(fname)) {
        continue;
      }
{code}
This wantsField() call uses wildcard handling.
So, reviewing this information, it seems like our handling of this at the 
DocsStreamer is fine here. It doesn't look costly to me, since it is performed 
only when fl has a pattern, and that pattern is checked against only non-stored 
DVs. Do you think there's something better that can be done which I'm missing?

bq.    The conditional logic in SolrIndexSearcher.decorateDocValueFields for 
multi-valued fields is too complicated! Can we please simplify this?
Made it simpler. :-)


> Read field from docValues for non stored fields
> -----------------------------------------------
>
>                 Key: SOLR-8220
>                 URL: https://issues.apache.org/jira/browse/SOLR-8220
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Keith Laban
>         Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch, 
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, 
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which 
> requires redundant data to be stored on disk. Since reading from docValues is 
> both efficient and a common practice (facets, analytics, streaming, etc), 
> reading values from docValues when a stored version of the field does not 
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as 
> they would always be returned sorted in the docValues approach. I believe 
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think 
> it should live closer to where stored fields are loaded in the 
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues, 
> facets, analytics, streaming, etc, all seem to be doing their own ways, 
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
>   -- return field from docValue if the field is not stored and in docValues, 
> if the field is stored return it from stored fields
> - fl="*"
>   -- return only stored fields
> - fl="+"
>    -- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first 
> pass. 2b - is current behavior



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-8220) Read field from docValues for non stored fields

Reply via email to