[
https://issues.apache.org/jira/browse/LUCENE-5333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13818466#comment-13818466
]
Shai Erera commented on LUCENE-5333:
------------------------------------
bq. Well, I think the facet module already has too many classes
That's unrelated. It's like saying Lucene has many APIs: IndexWriter,
IndexWriterConfig, Document, Field, MergePolicy, Query, QueryParser, Collector,
IndexReader, IndexSearcher... just to name a few :). What's important here is
FacetAccumulator and FacetRequest .. that's it. The rest are *totally*
unrelated.
This scenario fits into another accumulator. Or else, we'll end up with facet
code diverging left and right. Even now, for really no good reason, if you
choose to index facets using SortedSetDV, you can only count them. Why? What
prevents these ords from weighted by SumScore or a ValueSource? Nothing I
think? So I'm worried that if you add this to only SortedSetDV, it will
increase the difference between the two.
Rather, I prefer to pick the right API. We say that FacetsAccumulator is your
entry point to accumulating facets. So far we've made FacetsAccumulator.create
adhere to all existing FacetRequests and accumulators and return the proper
one. I think that's a good API? And if all an AllFA needs to do is create dummy
requests and filter out the not interesting ones, why complicate the code of
all other accumulators (existing and future ones)? Won't it be simpler to add
EnumFacetsAccumulator support to AllFA?
Look, this is not a rocket science feature. Besides that I don't think it's
such an important or common feature, I think the app doesn't really need to go
out of its way to support it -- it can easily create all possible FRs using
very simple API, and filter out FacetResults whose FRN.subResults is empty. Can
we make a simple utility for these apps - I'm all for it! But I prefer that we
don't complicate the code of existing FAs.
> Support sparse faceting for heterogeneous indices
> -------------------------------------------------
>
> Key: LUCENE-5333
> URL: https://issues.apache.org/jira/browse/LUCENE-5333
> Project: Lucene - Core
> Issue Type: New Feature
> Components: modules/facet
> Reporter: Michael McCandless
> Attachments: LUCENE-5333.patch
>
>
> In some search apps, e.g. a large e-commerce site, the index can have
> a mix of wildly different product categories and facet dimensions, and
> the number of dimensions could be huge.
> E.g. maybe the index has shirts, computer memory, hard drives, etc.,
> and each of these many categories has different attributes.
> In such an index, when someone searches for "so dimm", which should
> match a bunch of laptop memory modules, you can't (easily) know up
> front which facet dimensions will be important.
> But, I think this is very easy for the facet module, since ords are
> stored "row stride" (each doc lists all facet labels it has), we could
> simply count all facets that the hits actually saw, and then in the
> end see which ones "got traction" and return facet results for these
> top dims.
> I'm not sure what the API would look like, but conceptually this
> should work very well, because of how the facet module works.
> You shouldn't have to state up front exactly which facet dimensions
> to count...
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]