[ https://issues.apache.org/jira/browse/SOLR-14614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Timothy Potter reassigned SOLR-14614: ------------------------------------- Assignee: (was: Timothy Potter) > Add Simplified Aggregation Interface to Streaming Expression > ------------------------------------------------------------ > > Key: SOLR-14614 > URL: https://issues.apache.org/jira/browse/SOLR-14614 > Project: Solr > Issue Type: Improvement > Components: query, query parsers, streaming expressions > Affects Versions: 7.7.2, 8.4.1 > Reporter: Aroop > Priority: Major > > For the Data Analytics use cases the standard use case is: > # Find a pattern > # Then Aggregate by certain dimensions > # Then compute metrics (like count, sum, avg) > # Sort by a dimension or metric > # look at top-n > This functionality has been available over many different interfaces in the > past on solr, but only streaming expressions have the ability to deliver > results in a scalable, performant and stable manner for systems that have > large data to the tune of Big data systems. > However, one barrier to entry is the query interface, not being simple enough > in streaming expressions. > to give an example of how involved the corresponding streaming expression can > get, to get it to work on large scale systems,{color:#4c9aff} _find top 10 > cities where someone named Alex works with the respective counts_{color} > {code:java} > qt=/stream&aggregationMode=facet&expr= > select( top( rollup(sort(by%3D"city+asc", > +plist( > > select(facet(collection1,+q%3D"(*:*+AND+name:alex)",+buckets%3D"city",+bucketSizeLimit%3D"2010",+bucketSorts%3D"count(*)+desc",+count(*)),+city,+count(*)+as+Nj3bXa), > > select(facet(collection2,+q%3D"(*:*+AND+name:alex)",+buckets%3D"city",+bucketSizeLimit%3D"2010",+bucketSorts%3D"count(*)+desc",+count(*)),+city,+count(*)+as+Nj3bXa) > )), > +over%3D"city",+sum(Nj3bXa)), > +n%3D"10",+sort%3D"sum(Nj3bXa)+desc"), > +city,+sum(Nj3bXa)+as+Nj3bXa) > {code} > This is a query on an alias with 2 collections behind it representing 2 data > partitions, which is a requirement of sorts in big data systems. This is one > of the only ways to get information from Billions of records in a matter of > seconds. This is awesome in terms of capability and performance. > But one can see how involved this syntax can be in the current scheme and is > a barrier to entry for new adopters. > > This Jira is to track the work of creating a simplified analytics endpoint > augmenting streaming expressions. > a starting proposal is to have the endpoint have these query parameters: > {code:java} > /analytics?action=aggregate&q=*:*&fq=name:alex&dimensions=city&metrics=count&sort=count&sortOrder=desc&limit=10{code} > This is equivalent to a sql that an analyst would write: > {code:java} > select city, count(*) from collection where name = 'alex' > group by city order by count(*) desc limit 10;{code} > On the solr side this would get translated to the best possible streaming > expression using *rollups, top, sort, plist* etc.; but all done transparently > to the user. > Heres to making the power of Streaming expressions simpler to use for all. > > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org