[
https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818458#comment-13818458
]
Michael McCandless commented on LUCENE-5333:
--------------------------------------------
bq. Why is it an overkill?
Well, I think the facet module already has too many classes /
abstractions: aggregators, accumulators, ordinal policies, search
params, indexing params, cat paths, encoders, decoders, etc. I think
this (huge API surface area) is a big impediment to users adopting it
and devs contributing to it.
So, I really don't want to make this worse, by adding yet another
Accumulator, that has static factory methods, to create yet other
Accumulators that are subclasses of existing Accumulators. I think
it's too much.
I also don't like separating concerns: I think that's a sign that
something is wrong. I don't think a single class (AllFA) should be
expected to handle both taxonomy based and SSDV based cases.
We already have classes that count facets using those two methods, so
I think we should just add this capability to each of those classes.
And, if we add the enum facet method (and others), then the natural
place to add sparse handling for it would be to its own class, I
think.
bq. So I'm curious - did you try a dedicated class and ran into troubles?
No, I haven't tried: I just didn't really like that approach... so I
focused on the impl instead ...
bq. Is there a reason to not allocating the CFRs up front and setting them on
the FSP?
I really don't like the approach of "create CFR for every possible
dim". I realize this is a simple way to implement it, but it seems
wrong. And I especially don't want the API to expose that we are
somehow doing this: it's an impl detail.
So I wanted to get "closer" to not creating all CFRs up-front, and
doing it "transiently" seemed at least a bit better than bringing the
entire list into existence.
But I think I can improve on the patch so that we don't even make a
CFR until we see that any labels had non-zero count ... I'll work
towards that.
bq. You sort the FacetResult based on the FResNode.value (their root). Does
SortedSet always assign a value to the root of a FacetResult.node?
Yes, it does, in the sparse case (I ignore the ord policy).
> Support sparse faceting for heterogeneous indices
> -------------------------------------------------
>
> Key: LUCENE-5333
> URL: https://issues.apache.org/jira/browse/LUCENE-5333
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/facet
> Reporter: Michael McCandless
> Attachments: LUCENE-5333.patch
>
>
> In some search apps, e.g. a large e-commerce site, the index can have
> a mix of wildly different product categories and facet dimensions, and
> the number of dimensions could be huge.
> E.g. maybe the index has shirts, computer memory, hard drives, etc.,
> and each of these many categories has different attributes.
> In such an index, when someone searches for "so dimm", which should
> match a bunch of laptop memory modules, you can't (easily) know up
> front which facet dimensions will be important.
> But, I think this is very easy for the facet module, since ords are
> stored "row stride" (each doc lists all facet labels it has), we could
> simply count all facets that the hits actually saw, and then in the
> end see which ones "got traction" and return facet results for these
> top dims.
> I'm not sure what the API would look like, but conceptually this
> should work very well, because of how the facet module works.
> You shouldn't have to state up front exactly which facet dimensions
> to count...
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]