[ 
https://issues.apache.org/jira/browse/SOLR-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter reassigned SOLR-14614:
-------------------------------------

    Assignee:     (was: Timothy Potter)

> Add Simplified Aggregation Interface to Streaming Expression
> ------------------------------------------------------------
>
>                 Key: SOLR-14614
>                 URL: https://issues.apache.org/jira/browse/SOLR-14614
>             Project: Solr
>          Issue Type: Improvement
>          Components: query, query parsers, streaming expressions
>    Affects Versions: 7.7.2, 8.4.1
>            Reporter: Aroop
>            Priority: Major
>
> For the Data Analytics use cases the standard use case is:
>  # Find a pattern
>  # Then Aggregate by certain dimensions
>  # Then compute metrics (like count, sum, avg)
>  # Sort by a dimension or metric
>  # look at top-n
> This functionality has been available over many different interfaces in the 
> past on solr, but only streaming expressions have the ability to deliver 
> results in a scalable, performant and stable manner for systems that have 
> large data to the tune of Big data systems.
> However, one barrier to entry is the query interface, not being simple enough 
> in streaming expressions.
> to give an example of how involved the corresponding streaming expression can 
> get, to get it to work on large scale systems,{color:#4c9aff} _find top 10 
> cities where someone named Alex works with the respective counts_{color}
> {code:java}
> qt=/stream&aggregationMode=facet&expr=
> select( top( rollup(sort(by%3D"city+asc",
>    +plist( 
>           
> select(facet(collection1,+q%3D"(*:*+AND+name:alex)",+buckets%3D"city",+bucketSizeLimit%3D"2010",+bucketSorts%3D"count(*)+desc",+count(*)),+city,+count(*)+as+Nj3bXa),
>           
> select(facet(collection2,+q%3D"(*:*+AND+name:alex)",+buckets%3D"city",+bucketSizeLimit%3D"2010",+bucketSorts%3D"count(*)+desc",+count(*)),+city,+count(*)+as+Nj3bXa)
>          )),
>               +over%3D"city",+sum(Nj3bXa)),
>       +n%3D"10",+sort%3D"sum(Nj3bXa)+desc"),
> +city,+sum(Nj3bXa)+as+Nj3bXa)
> {code}
> This is a query on an alias with 2 collections behind it representing 2 data 
> partitions, which is a requirement of sorts in big data systems. This is one 
> of the only ways to get information from Billions of records in a matter of 
> seconds. This is awesome in terms of capability and performance.
> But one can see how involved this syntax can be in the current scheme and is 
> a barrier to entry for new adopters.
>  
> This Jira is to track the work of creating a simplified analytics endpoint 
> augmenting streaming expressions.
> a starting proposal is to have the endpoint have these query parameters:
> {code:java}
> /analytics?action=aggregate&q=*:*&fq=name:alex&dimensions=city&metrics=count&sort=count&sortOrder=desc&limit=10{code}
> This is equivalent to a sql that an analyst would write:
> {code:java}
> select city, count(*) from collection where name = 'alex'
> group by city order by count(*) desc limit 10;{code}
> On the solr side this would get translated to the best possible streaming 
> expression using *rollups, top, sort, plist* etc.; but all done transparently 
> to the user.
> Heres to making the power of Streaming expressions simpler to use for all.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to