[ 
https://issues.apache.org/jira/browse/SOLR-8965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cassandra Targett updated SOLR-8965:
------------------------------------
    Component/s: streaming expressions

> Add Path reduce operation to aggregate paths in a session
> ---------------------------------------------------------
>
>                 Key: SOLR-8965
>                 URL: https://issues.apache.org/jira/browse/SOLR-8965
>             Project: Solr
>          Issue Type: New Feature
>          Components: streaming expressions
>            Reporter: Joel Bernstein
>            Priority: Major
>
> Session aggregation can be hard to do at scale. MapReduce of course makes 
> this easy. Now that we have MapReduce it would be good to add some session 
> aggregations to the base library. 
> The Path reduce operation can be used with the *reduce* function to 
> concatenate the path taken in a session into a single field. These path 
> records can then be added to another SolrCloud collection using the update 
> stream. Once they have been consolidated in another collection aggregations 
> can be run on the paths using the RollupStream.
> A HashRollupStream could also be developed to aggregate the paths as they are 
> reduced. The HashRollupStream would keep all the paths in a hash map during 
> the aggregation so it would not require the paths to be received in order.
> sample syntax:
> {code}
> reduce(search(logs, q="*:*", sort="sessionId, timestamp", ...),
>        by="sessionId",
>        path(field="pageId"))
> {code}
> This would work great in parallel by partitioning on the sessionId.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to