[ 
https://issues.apache.org/jira/browse/SOLR-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051764#comment-14051764
 ] 

Erick Erickson commented on SOLR-5302:
--------------------------------------

bq: I didn't realize that... can you point me at the discussion?

I mis-stated that severely, my apologies. What I should have said is more along 
the lines that I don't quite know what to do with back-porting the analytics 
stuff to 4.x. Or whether we should. It's quite a bit of code, the interface is 
complex, and it doesn't play nice in distributed mode. I believe there are 
functions that simply won't work distributed. And maybe can't.

Then there's the pluggable analytics framework that's been recently added. I 
really wonder whether the right thing to do long-term is to pull this out of 5x 
and port as much as possible into the pluggable analytics framework piecemeal 
as necessary, stealing as much as possible and supporting what can be supported 
in distributed mode. That still leaves the question of what to do with 
functions that are inherently difficult/impossible to support in sharded 
environments...

See SOLR-5963 for some of the other discussion about whether to move this to a  
contrib rather than have it be in the mainline code. My concern is that if we 
move it to a contrib, it'll just be code that languishes, especially given the 
distributed limitations. Would it just be better to use the pluggable 
framework? It seems to me that the use-case for single-shard analytics is 
becoming less compelling, but that may be a misperception on my part.

Don't want it to seem like there's any decision here, more like I don't want to 
introduce this much code into the mainline tree if it doesn't have wide 
applicability, and I think the lack of distributed support severely limits how 
widely it applies.

That said, I'm not dogmatically opposed either. But I'd like some sense of what 
others think about it.

> Analytics Component
> -------------------
>
>                 Key: SOLR-5302
>                 URL: https://issues.apache.org/jira/browse/SOLR-5302
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Steven Bower
>            Assignee: Erick Erickson
>             Fix For: 5.0
>
>         Attachments: SOLR-5302.patch, SOLR-5302.patch, SOLR-5302.patch, 
> SOLR-5302.patch, Search Analytics Component.pdf, Statistical Expressions.pdf, 
> solr_analytics-2013.10.04-2.patch
>
>
> This ticket is to track a "replacement" for the StatsComponent. The 
> AnalyticsComponent supports the following features:
> * All functionality of StatsComponent (SOLR-4499)
> * Field Faceting (SOLR-3435)
> ** Support for limit
> ** Sorting (bucket name or any stat in the bucket
> ** Support for offset
> * Range Faceting
> ** Supports all options of standard range faceting
> * Query Faceting (SOLR-2925)
> * Ability to use overall/field facet statistics as input to range/query 
> faceting (ie calc min/max date and then facet over that range
> * Support for more complex aggregate/mapping operations (SOLR-1622)
> ** Aggregations: min, max, sum, sum-of-square, count, missing, stddev, mean, 
> median, percentiles
> ** Operations: negation, abs, add, multiply, divide, power, log, date math, 
> string reversal, string concat
> ** Easily pluggable framework to add additional operations
> * New / cleaner output format
> Outstanding Issues:
> * Multi-value field support for stats (supported for faceting)
> * Multi-shard support (may not be possible for some operations, eg median)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to