[
https://issues.apache.org/jira/browse/SOLR-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051764#comment-14051764
]
Erick Erickson commented on SOLR-5302:
--------------------------------------
bq: I didn't realize that... can you point me at the discussion?
I mis-stated that severely, my apologies. What I should have said is more along
the lines that I don't quite know what to do with back-porting the analytics
stuff to 4.x. Or whether we should. It's quite a bit of code, the interface is
complex, and it doesn't play nice in distributed mode. I believe there are
functions that simply won't work distributed. And maybe can't.
Then there's the pluggable analytics framework that's been recently added. I
really wonder whether the right thing to do long-term is to pull this out of 5x
and port as much as possible into the pluggable analytics framework piecemeal
as necessary, stealing as much as possible and supporting what can be supported
in distributed mode. That still leaves the question of what to do with
functions that are inherently difficult/impossible to support in sharded
environments...
See SOLR-5963 for some of the other discussion about whether to move this to a
contrib rather than have it be in the mainline code. My concern is that if we
move it to a contrib, it'll just be code that languishes, especially given the
distributed limitations. Would it just be better to use the pluggable
framework? It seems to me that the use-case for single-shard analytics is
becoming less compelling, but that may be a misperception on my part.
Don't want it to seem like there's any decision here, more like I don't want to
introduce this much code into the mainline tree if it doesn't have wide
applicability, and I think the lack of distributed support severely limits how
widely it applies.
That said, I'm not dogmatically opposed either. But I'd like some sense of what
others think about it.
> Analytics Component
> -------------------
>
> Key: SOLR-5302
> URL: https://issues.apache.org/jira/browse/SOLR-5302
> Project: Solr
> Issue Type: New Feature
> Reporter: Steven Bower
> Assignee: Erick Erickson
> Fix For: 5.0
>
> Attachments: SOLR-5302.patch, SOLR-5302.patch, SOLR-5302.patch,
> SOLR-5302.patch, Search Analytics Component.pdf, Statistical Expressions.pdf,
> solr_analytics-2013.10.04-2.patch
>
>
> This ticket is to track a "replacement" for the StatsComponent. The
> AnalyticsComponent supports the following features:
> * All functionality of StatsComponent (SOLR-4499)
> * Field Faceting (SOLR-3435)
> ** Support for limit
> ** Sorting (bucket name or any stat in the bucket
> ** Support for offset
> * Range Faceting
> ** Supports all options of standard range faceting
> * Query Faceting (SOLR-2925)
> * Ability to use overall/field facet statistics as input to range/query
> faceting (ie calc min/max date and then facet over that range
> * Support for more complex aggregate/mapping operations (SOLR-1622)
> ** Aggregations: min, max, sum, sum-of-square, count, missing, stddev, mean,
> median, percentiles
> ** Operations: negation, abs, add, multiply, divide, power, log, date math,
> string reversal, string concat
> ** Easily pluggable framework to add additional operations
> * New / cleaner output format
> Outstanding Issues:
> * Multi-value field support for stats (supported for faceting)
> * Multi-shard support (may not be possible for some operations, eg median)
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]