[
https://issues.apache.org/jira/browse/SOLR-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675931#comment-16675931
]
Hoss Man commented on SOLR-8273:
--------------------------------
At the Activate Conference last month I talked with some folks who have some
very big Solr installations who mentioned that they have {{docValues="true"}}
enabled on every fieldtype in their schema(s), not because they need/want to
use them, but because it's the only way to ensure that a stray/mistaken request
to sort/facet on one of these fields won't cause the heap usage to blow up
building FieldCache – they wind up pay a huge indexing & disk usage cost for
these docValues that they explicitly don't want!
That got me rethinking this issue, and how easy I remembered thinking it would
be to add an {{uninvertible=false}} option for fieldTypes, and wanting to
sanity check how hard the impl would actaully be. I tried it out and the answer
is "very easy" ... to the point that I'm incredibly embarassed at the fact that
we haven't done so yet.
I think we should *definitely* add {{uninvertible=false}} as an option in
soonest possible release...
... _however_ ...
... the more i look at it and how existing code deals with
docValues/FieldCache, the less convinced I am that we should "rush" changing
the default to {{uninvertible=false}} (when schema {{version > 1.6}} ). The key
reasons for my hesitation have to do with the existing behavior of faceting
(both SimpleFacets and JSON Facets) when dealing with fields that are
{{docValues="false" indexed="false"}} – both the default behavior as well as
what happens if you try to force an expliit facet algorithm (ie:
{{facet.method=XXX}} and {{method: XXX}} ) on a field that is only indexed or
only docValues, or neither ... the short version is we don't ever return an
explicit error message if we can't facet on a field (in the method requested)
we just return an empty list of buckets.
That existing behavior makes me very leary of changing the default FieldCache
behavior – even dependent on a new {{version="1.7"}} for schemas – just because
of how confusing it might be for new users, or existing users who create new
collections using the new {{_default}} schema (not to mention users who might
be reading old tutorials/docs/blogs/etc...).
I feel like _before_ we consider changing the default behavior, we should
probably have a much more in depth conversation as a community about if/how we
want to change the automatic facet method selection for fields based on if/when
they are uninvertible, and if/how we want to "fail loudly" when an explicit
method is provided by the user. ... *BUT* ... I still think we should ASAP
provide the _option_ for users who *know* they don't want FieldCaches to be
created to be able to say that – and give these users/fields facet behavior
consistent with what would happen if a they were {{indexed="false"}}
With that in mind, I'm going to create 2 sub-tasks for this jira, and attach
the patch(es) with my work in progress so far (and associated "TODO" lists) for
consideration.
I'm interested in feedback – not just on the patches (ideally as comments in
the sub-task issues themselves), but also (here) if anyone has any specific
concerns on the idea of spliting up my previous proposal such that: we can
support this {{uninvertible=false}} option available ASAP (ideally in the next
7.x release), while defering on the disccussion to change the default value to
{{true}}
?
> deprecate implicitly uninverted fields, force people to either use docValues,
> or be explicit that they want query time uninversion
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-8273
> URL: https://issues.apache.org/jira/browse/SOLR-8273
> Project: Solr
> Issue Type: Improvement
> Components: Schema and Analysis
> Reporter: Hoss Man
> Priority: Major
>
> once upon a time, there was nothing we could do to *stop* people from using
> the FieldCache - even if they didn't realize they were using it.
> Then DocValues was added - and now people have a choice: they can set
> {{docValues=true}} on a field/fieldtype and know that when they do
> functions/sorting/faceting on that field, it won't require a big hunk of ram
> and a big stall everytime a reader was reopened. But it's easy to overlook
> when clients might be doing something that required the FieldCache w/o
> realizing it -- and there is no way to stop them, because Solr automatically
> uses UninvertingReader under the covers and automatically allows every field
> to be uninverted in this way.
> we should change that.
> ----
> Straw man proposal...
> * introduce a new boolean fieldType/field property {{uninvertable}}
> * all existing FieldType classes should default to {{uninvertable==false}}
> * a field or fieldType that contains {{indexed="false" uninvertable="true"}}
> should be an error.
> * the Schema {{version}} value should be incremented, such that any Schema
> with an older version is treated as if every field with {{docValues==false}}
> has an implict {{uninvertable="true"}} on it.
> * the Map passed to UninvertedReader should now only list items that have an
> effective value of {{uninvertable==true}}
> * sample schemas should be updated to use docValues on any field where the
> examples using those schemas suggest using those fields in that way (ie:
> sorting, faceting, etc...)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]