[
https://issues.apache.org/jira/browse/SOLR-8220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ishan Chattopadhyaya updated SOLR-8220:
---------------------------------------
Attachment: SOLR-8220.patch
Thanks for your review, Shalin. I've updated the patch to address your
suggestions.
bq. The SolrIndexSearcher.decorateDocValueFields method has a
honourUseDVsAsStoredFlag which is always true. We can remove it?
bq. Same for SolrIndexSearcher.getNonStoredDocValuesFieldNames?
Refactored the decorateDocValues() a bit to not send in wantsAllFields flag to
the method and to handle it at the DocsStreamer itself. Hence, now, the
decorateDocValues() method takes in only the field names it needs to do
anything about; the filtering for non-stored dvs is taken care of at
DocsStreamer.next() itself.
Since, for the {{fl=\*}} case, we need all non-stored DVs that have
{{useDocValuesAsStored}}=true, but for the general filtering case of
{{fl=dv1,dv2}} we need to filter using all non-stored DVs (irrespective of the
useDocValuesAsStored flag), I've retained this true/false logic in the
getNonStoredDocValuesFieldNames() method. Renamed that method, however, to call
it {{getNonStoredDVs(boolean onlyUseDocValuesAsStored)}} and added a clear
javadoc to this effect.
bq. The wantsAllFields flag added to SolrIndexSearcher.doc doesn't seem
necessary. I guess it was added because the patch adds non stored doc values
fields to the 'fnames' but if we can separate out stored fnames from the
non-stored doc values to be returned then we can remove this param from both
SolrIndexSearcher.doc and SolrIndexSearcher.getNonStoredDocValuesFieldNames
I think the original motivation was to deal with cases {{fl=\*,nonstoredDv1}}.
Here, the idea initially was that {{\*}} returns all stored fields, and
nonstoredDv1 is added to it. But now, since {{\*}} takes care of all stored and
non-stored dvs, this logic isn't needed. So, this wantsAllFields flag was a
left over from a previous patch which I've now removed.
bq. The pattern matching in the DocStreamer constructor makes a bit nervous.
Where is the pattern matching done for current stored fields?
Keith can weigh in on this better. However, I had a look, and found that
responseWriters (e.g. JSONResponseWriter) get the whole SolrDocument at the
{{writeSolrDocument()}} method, from where it does the following call to drop
fields it doesn't need:
{code}
for (String fname : doc.getFieldNames()) {
if (returnFields!= null && !returnFields.wantsField(fname)) {
continue;
}
{code}
This wantsField() call uses wildcard handling.
So, reviewing this information, it seems like our handling of this at the
DocsStreamer is fine here. It doesn't look costly to me, since it is performed
only when fl has a pattern, and that pattern is checked against only non-stored
DVs. Do you think there's something better that can be done which I'm missing?
bq. The conditional logic in SolrIndexSearcher.decorateDocValueFields for
multi-valued fields is too complicated! Can we please simplify this?
Made it simpler. :-)
> Read field from docValues for non stored fields
> -----------------------------------------------
>
> Key: SOLR-8220
> URL: https://issues.apache.org/jira/browse/SOLR-8220
> Project: Solr
> Issue Type: Improvement
> Reporter: Keith Laban
> Attachments: SOLR-8220-5x.patch, SOLR-8220-ishan.patch,
> SOLR-8220-ishan.patch, SOLR-8220-ishan.patch, SOLR-8220-ishan.patch,
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch, SOLR-8220.patch,
> SOLR-8220.patch, SOLR-8220.patch
>
>
> Many times a value will be both stored="true" and docValues="true" which
> requires redundant data to be stored on disk. Since reading from docValues is
> both efficient and a common practice (facets, analytics, streaming, etc),
> reading values from docValues when a stored version of the field does not
> exist would be a valuable disk usage optimization.
> The only caveat with this that I can see would be for multiValued fields as
> they would always be returned sorted in the docValues approach. I believe
> this is a fair compromise.
> I've done a rough implementation for this as a field transform, but I think
> it should live closer to where stored fields are loaded in the
> SolrIndexSearcher.
> Two open questions/observations:
> 1) There doesn't seem to be a standard way to read values for docValues,
> facets, analytics, streaming, etc, all seem to be doing their own ways,
> perhaps some of this logic should be centralized.
> 2) What will the API behavior be? (Below is my proposed implementation)
> Parameters for fl:
> - fl="docValueField"
> -- return field from docValue if the field is not stored and in docValues,
> if the field is stored return it from stored fields
> - fl="*"
> -- return only stored fields
> - fl="+"
> -- return stored fields and docValue fields
> 2a - would be easiest implementation and might be sufficient for a first
> pass. 2b - is current behavior
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]