Thanks Shawn and Michael, this is really helpful and makes clear sense. On Fri, May 20, 2022 at 7:36 PM Michael Gibney <[email protected]> wrote:
> (echoing Shawn because I was about to hit send anyway): > > The process of "uninverting" a field involves running through the > dictionary of indexed terms for a given field, and building an on-heap data > structure that provides "doc => term" lookup (analogous to docValues), as > opposed to "term => doc" lookup, which is standard for an indexed field. > The downside is that you'll have a searcher warmup-latency cost associated > with uninverting the field (and building the docValues-like datastructure), > in addition to the (potentially quite large) heap space allocation that > contributes static overhead to heap space requirements, must be traversed > by GC operations, etc. > > In most cases that need docValues-type access, you really want to use > actual docValues (i.e. "docValues=true"), which allows these datastructures > to be directly disk-backed -- effectively off-heap, but with efficient > os-level caching based on memory-mapped files. There are a few cases where > you may still need to rely on "uninvertible=true": e.g., if you want to > facet on tokenized values of a text field, currently "uninvertible=true" is > the only way to go, because there's currently no way to have post-analysis > docValues (required to be compatible between indexed terms and terms as > represented in docValues). > > "uninvertible=false" is generally useful as a sanity-check to make sure > you're not unknowingly relying on this legacy/backcompat "uninversion" > behavior. If there were a way to have "uninvertible" globally default to > "false", I would recommend to do so. But I think there is not at the > moment, so manually configuring "uninvertbile=false" and adding > "docValues=true" or "uninvertible=true" as necessary (preferred in that > order) is generally a good recommendation. > > On Fri, May 20, 2022 at 1:32 PM Shawn Heisey <[email protected]> wrote: > > > On 5/19/22 01:13, Vincenzo D'Amore wrote: > > > As far as I understand, we should always set the property > > > uninvertible=false to avoid that Solr builds "up large in memory data > > > structure to serve in place of DocValues" and this is good for > > "stability", > > > not explaining exactly what it means. > > > > > > Could anyone please describe this better, for example describing a > worst > > > case scenario and a good one? > > > > If the class used in the fieldType is one that supports docValues, then > > you should probably set set that to false. And if you need any features > > on that field (like facets) that require an uninverted view of the > > index, be sure that docValues is true. > > > > Some fieldType classes, TextField being the one that comes to mind, > > cannot support docValues. In general I would recommend setting > > uninvertable to false on that kind of field as well, but if you actually > > did want to do something like facets on such a field, you would need > > uninvertable to be set to true. > > > > Thanks, > > Shawn > > > > > -- Vincenzo D'Amore
