[
https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818379#comment-13818379
]
Shai Erera commented on LUCENE-5333:
------------------------------------
Why is it an overkill? Same AllFacetsAccumulator can be used for SortedSet as
well as Taxonomy based faceting. We don't need to duplicate the logic between
them. Also, I think that adding that directly to SSDVAccumulator or
TaxonomyFacetsAccumulator kind of makes "sparse faceting" the norm rather than
the outlier. I don't think it's *that* common usecase and having a dedicated
accumulator seems better to me. It certainly keeps those accumulators' code
focused on what they're supposed to do, not complicating their code (to the
casual reader).
It should be a very simple accumulator, like what you did in .accumulate(),
only will allow us to improve things in the future. We could even factor the
logic into FA.create(), if it receives a null list of requests it allocates the
proper AllFacetsAccumulator.
So I'm curious - did you try a dedicated class and ran into troubles?
About the code in the patch:
* Is there a reason to not allocating the CFRs up front and setting them on the
FSP? I mean, the only difference between that and what you do now is that the
CFRs now are allocated transiently, but I don't think that should be an issue
in general (like how many dims will those apps have?). Also, it might allow in
the future to extend this support to sampling too.
* You sort the FacetResult based on the FResNode.value (their root). Does
SortedSet always assign a value to the root of a FacetResult.node?
If you don't feel like handling the Taxonomy case at the moment, that's fine. I
still think we should add an AllFacetsAccumulator with a .create() which wraps
an SSDV. We can add Taxonomy faceting to it later (though I hope it's just
means another .create()).
> Support sparse faceting for heterogeneous indices
> -------------------------------------------------
>
> Key: LUCENE-5333
> URL: https://issues.apache.org/jira/browse/LUCENE-5333
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/facet
> Reporter: Michael McCandless
> Attachments: LUCENE-5333.patch
>
>
> In some search apps, e.g. a large e-commerce site, the index can have
> a mix of wildly different product categories and facet dimensions, and
> the number of dimensions could be huge.
> E.g. maybe the index has shirts, computer memory, hard drives, etc.,
> and each of these many categories has different attributes.
> In such an index, when someone searches for "so dimm", which should
> match a bunch of laptop memory modules, you can't (easily) know up
> front which facet dimensions will be important.
> But, I think this is very easy for the facet module, since ords are
> stored "row stride" (each doc lists all facet labels it has), we could
> simply count all facets that the hits actually saw, and then in the
> end see which ones "got traction" and return facet results for these
> top dims.
> I'm not sure what the API would look like, but conceptually this
> should work very well, because of how the facet module works.
> You shouldn't have to state up front exactly which facet dimensions
> to count...
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]