[ 
https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818379#comment-13818379
 ] 

Shai Erera commented on LUCENE-5333:
------------------------------------

Why is it an overkill? Same AllFacetsAccumulator can be used for SortedSet as 
well as Taxonomy based faceting. We don't need to duplicate the logic between 
them. Also, I think that adding that directly to SSDVAccumulator or 
TaxonomyFacetsAccumulator kind of makes "sparse faceting" the norm rather than 
the outlier. I don't think it's *that* common usecase and having a dedicated 
accumulator seems better to me. It certainly keeps those accumulators' code 
focused on what they're supposed to do, not complicating their code (to the 
casual reader).

It should be a very simple accumulator, like what you did in .accumulate(), 
only will allow us to improve things in the future. We could even factor the 
logic into FA.create(), if it receives a null list of requests it allocates the 
proper AllFacetsAccumulator.

So I'm curious - did you try a dedicated class and ran into troubles?

About the code in the patch:

* Is there a reason to not allocating the CFRs up front and setting them on the 
FSP? I mean, the only difference between that and what you do now is that the 
CFRs now are allocated transiently, but I don't think that should be an issue 
in general (like how many dims will those apps have?). Also, it might allow in 
the future to extend this support to sampling too.

* You sort the FacetResult based on the FResNode.value (their root). Does 
SortedSet always assign a value to the root of a FacetResult.node?

If you don't feel like handling the Taxonomy case at the moment, that's fine. I 
still think we should add an AllFacetsAccumulator with a .create() which wraps 
an SSDV. We can add Taxonomy faceting to it later (though I hope it's just 
means another .create()).

> Support sparse faceting for heterogeneous indices
> -------------------------------------------------
>
>                 Key: LUCENE-5333
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5333
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Michael McCandless
>         Attachments: LUCENE-5333.patch
>
>
> In some search apps, e.g. a large e-commerce site, the index can have
> a mix of wildly different product categories and facet dimensions, and
> the number of dimensions could be huge.
> E.g. maybe the index has shirts, computer memory, hard drives, etc.,
> and each of these many categories has different attributes.
> In such an index, when someone searches for "so dimm", which should
> match a bunch of laptop memory modules, you can't (easily) know up
> front which facet dimensions will be important.
> But, I think this is very easy for the facet module, since ords are
> stored "row stride" (each doc lists all facet labels it has), we could
> simply count all facets that the hits actually saw, and then in the
> end see which ones "got traction" and return facet results for these
> top dims.
> I'm not sure what the API would look like, but conceptually this
> should work very well, because of how the facet module works.
> You shouldn't have to state up front exactly which facet dimensions
> to count...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to