[ 
https://issues.apache.org/jira/browse/SOLR-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16675931#comment-16675931
 ] 

Hoss Man commented on SOLR-8273:
--------------------------------

At the Activate Conference last month I talked with some folks who have some 
very big Solr installations who mentioned that they have {{docValues="true"}} 
enabled on every fieldtype in their schema(s), not because they need/want to 
use them, but because it's the only way to ensure that a stray/mistaken request 
to sort/facet on one of these fields won't cause the heap usage to blow up 
building FieldCache – they wind up pay a huge indexing & disk usage cost for 
these docValues that they explicitly don't want!

That got me rethinking this issue, and how easy I remembered thinking it would 
be to add an {{uninvertible=false}} option for fieldTypes, and wanting to 
sanity check how hard the impl would actaully be. I tried it out and the answer 
is "very easy" ... to the point that I'm incredibly embarassed at the fact that 
we haven't done so yet.

I think we should *definitely* add {{uninvertible=false}} as an option in 
soonest possible release...

... _however_ ...

... the more i look at it and how existing code deals with 
docValues/FieldCache, the less convinced I am that we should "rush" changing 
the default to {{uninvertible=false}} (when schema {{version > 1.6}} ). The key 
reasons for my hesitation have to do with the existing behavior of faceting 
(both SimpleFacets and JSON Facets) when dealing with fields that are 
{{docValues="false" indexed="false"}} – both the default behavior as well as 
what happens if you try to force an expliit facet algorithm (ie: 
{{facet.method=XXX}} and {{method: XXX}} ) on a field that is only indexed or 
only docValues, or neither ... the short version is we don't ever return an 
explicit error message if we can't facet on a field (in the method requested) 
we just return an empty list of buckets.

That existing behavior makes me very leary of changing the default FieldCache 
behavior – even dependent on a new {{version="1.7"}} for schemas – just because 
of how confusing it might be for new users, or existing users who create new 
collections using the new {{_default}} schema (not to mention users who might 
be reading old tutorials/docs/blogs/etc...).

I feel like _before_ we consider changing the default behavior, we should 
probably have a much more in depth conversation as a community about if/how we 
want to change the automatic facet method selection for fields based on if/when 
they are uninvertible, and if/how we want to "fail loudly" when an explicit 
method is provided by the user. ... *BUT* ... I still think we should ASAP 
provide the _option_ for users who *know* they don't want FieldCaches to be 
created to be able to say that – and give these users/fields facet behavior 
consistent with what would happen if a they were {{indexed="false"}}

With that in mind, I'm going to create 2 sub-tasks for this jira, and attach 
the patch(es) with my work in progress so far (and associated "TODO" lists) for 
consideration.

I'm interested in feedback – not just on the patches (ideally as comments in 
the sub-task issues themselves), but also (here) if anyone has any specific 
concerns on the idea of spliting up my previous proposal such that: we can 
support this {{uninvertible=false}} option available ASAP (ideally in the next 
7.x release), while defering on the disccussion to change the default value to 
{{true}}

?

> deprecate implicitly uninverted fields, force people to either use docValues, 
> or be explicit that they want query time uninversion
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-8273
>                 URL: https://issues.apache.org/jira/browse/SOLR-8273
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Hoss Man
>            Priority: Major
>
> once upon a time, there was nothing we could do to *stop* people from using 
> the FieldCache - even if they didn't realize they were using it.
> Then DocValues was added - and now people have a choice: they can set 
> {{docValues=true}} on a field/fieldtype and know that when they do 
> functions/sorting/faceting on that field, it won't require a big hunk of ram 
> and a big stall everytime a reader was reopened.  But it's easy to overlook 
> when clients might be doing something that required the FieldCache w/o 
> realizing it -- and there is no way to stop them, because Solr automatically 
> uses UninvertingReader under the covers and automatically allows every field 
> to be uninverted in this way.
> we should change that.
> ----
> Straw man proposal...
> * introduce a new boolean fieldType/field property {{uninvertable}}
> * all existing FieldType classes should default to {{uninvertable==false}}
> * a field or fieldType that contains {{indexed="false" uninvertable="true"}} 
> should be an error.
> * the Schema {{version}} value should be incremented, such that any Schema 
> with an older version is treated as if every field with {{docValues==false}} 
> has an implict {{uninvertable="true"}} on it.
> * the Map passed to UninvertedReader should now only list items that have an 
> effective value of {{uninvertable==true}}
> * sample schemas should be updated to use docValues on any field where the 
> examples using those schemas suggest using those fields in that way (ie: 
> sorting, faceting, etc...)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to