[
https://issues.apache.org/jira/browse/LUCENE-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13822450#comment-13822450
]
Shai Erera commented on LUCENE-5339:
------------------------------------
About the patch:
* I think FacetField should an optional ctor taking the indexedFacetField,
defaulting to $facets, then the ctor calls super() with the right field, and
not "dummy"? And you can remove set/get?
* SimpleFacetsCollector jdocs are wrong -- there's no create?
* Do we still need SSDVFacetFields?
* I like FacetIW, but the nocommit, to handle updateDoc, addDocs etc. makes me
think if down the road we won't be sorry about doing this change (i.e. if
anything changes on IW APIs). The good thing about FacetFields is that it just
adds fields to a Document, and doesn't worry about IW API at all...
* DimType == DimConfig :). Sorry if I wasn't clear somewhere in my long
response.
{quote}
That handler is such an exotic use case ... and the app can just
recurse itself, calling TaxoFacetCounts.getTopChildren?
{quote}
Could be, maybe it will work. It definitely allows asking for different topK
for each child (something we currently don't support).
{quote}
It does, and I think that's OK? Yes, it's one extra ord it indexes,
but that will be a minor perf hit since it's a multi-valued and
hierarchical.
{quote}
I don't know. Like if your facet has 2 levels, that's 33% more ords. I think
the count of the root ord is most likely never needed? And if it's needed, app
can compute it by traversing its children and their values in the facet arrays?
Maybe as a default we just never index it, and don't add a vague
requiresDimCount/Value/Weight boolean?
{quote}
I think replicating bits of this code for the different methods is the lesser
evil than adding so many abstractions that the module is hard to
approach by new devs/users.
{quote}
Did we ever get such feedback from users? That the module is unapproachable?
I get the opposite feedback - that we don't have many abstractions! :)
{quote}
At the end of the day, the code that does the facet counting, the
rollup, pulling the topK, is in fact a small amount of code;
{quote}
You have a nocommit "maybe we should do this lazily" in regards for when to
rollupValues. That shows me that now every developer who extends this API
(let's be clear - users are oblivious to this happening) will facet the same
decision (nocommit). If we discover one day that it's better to rollup lazily
or not, other developers don't benefit from that decision. That's why I think
some abstractions are good.
{quote}
I added ValueSource aggregation in the next patch, but not
associations; I think associations can come later (it's just another
index time and search time impl).
{quote}
I'm not sure we should do that (cut over associations later). The whole point
about these features (associations, complements, sampling..) is that they are
existing features. If we think they are useless / unneeded - that's one thing.
But if we believe they are important, it's useless to make all the API changes
without taking them into account, only to figure out later that we need
abstraction X and Y in order to implement them.
And we make heavy use of associations, and some users asked (and use) sampling
and I remember a question about complements. So obviously we cannot conclude
that these are useless features. Therefore I think it's important that we try
to tackle them now, so don't we don't do a full round trip to find ourselves
with the same API again.
Can we do FacetIndexWriter in a separate issue (if we want to do it at all)?
It's unrelated to the search API changes you want to do here, and it might be
easier to contain within a single issue?
About CategoryListIterator ... what if we do manage to come up tomorrow with a
better encoding strategy for facets. Do you really think that changing all
existing WhateverFacets makes sense!? And if developers write their own
WhateverFacets, it means they need to change their code too? Really, you're
mixing optimizations (inlining dgap+vint) with ease of use. I know (!!) that
there are apps that can benefit from a different encoding scheme (e.g.
FourOnesIntEncoder). We don't need to wait until someone comes up w/ a better
default encoding scheme to introduce abstractions. I mean .. that's just sounds
crazy to me.
> Simplify the facet module APIs
> ------------------------------
>
> Key: LUCENE-5339
> URL: https://issues.apache.org/jira/browse/LUCENE-5339
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Attachments: LUCENE-5339.patch, LUCENE-5339.patch
>
>
> I'd like to explore simplifications to the facet module's APIs: I
> think the current APIs are complex, and the addition of a new feature
> (sparse faceting, LUCENE-5333) threatens to add even more classes
> (e.g., FacetRequestBuilder). I think we can do better.
> So, I've been prototyping some drastic changes; this is very
> early/exploratory and I'm not sure where it'll wind up but I think the
> new approach shows promise.
> The big changes are:
> * Instead of *FacetRequest/Params/Result, you directly instantiate
> the classes that do facet counting (currently TaxonomyFacetCounts,
> RangeFacetCounts or SortedSetDVFacetCounts), passing in the
> SimpleFacetsCollector, and then you interact with those classes to
> pull labels + values (topN under a path, sparse, specific labels).
> * At index time, no more FacetIndexingParams/CategoryListParams;
> instead, you make a new SimpleFacetFields and pass it the field it
> should store facets + drill downs under. If you want more than
> one CLI you create more than one instance of SimpleFacetFields.
> * I added a simple schema, where you state which dimensions are
> hierarchical or multi-valued. From this we decide how to index
> the ordinals (no more OrdinalPolicy).
> Sparse faceting is just another method (getAllDims), on both taxonomy
> & ssdv facet classes.
> I haven't created a common base class / interface for all of the
> search-time facet classes, but I think this may be possible/clean, and
> perhaps useful for drill sideways.
> All the new classes are under oal.facet.simple.*.
> Lots of things that don't work yet: drill sideways, complements,
> associations, sampling, partitions, etc. This is just a start ...
--
This message was sent by Atlassian JIRA
(v6.1#6144)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]