[
https://issues.apache.org/jira/browse/SOLR-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joel Bernstein updated SOLR-10651:
----------------------------------
Fix Version/s: (was: 7.0)
7.1
master (8.0)
> Streaming Expressions statistical functions library
> ---------------------------------------------------
>
> Key: SOLR-10651
> URL: https://issues.apache.org/jira/browse/SOLR-10651
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Components: streaming expressions
> Reporter: Joel Bernstein
> Assignee: Joel Bernstein
>
> This is a ticket for organizing the new statistical programming features of
> Streaming Expressions. It's also a place for the community to discuss what
> functions are needed to support statistical programming.
> Basic Syntax:
> {code}
> let(a = timeseries(...),
> b = timeseries(...),
> c = col(a, count(*)),
> d = col(b, count(*)),
> r = regress(c, d),
> tuple(p = predict(r, 50)))
> {code}
> The expression above is doing the following:
> 1) The let expression is setting variables (a, b, c, d, r).
> 2) Variables *a* and *b* are the output of timeseries() Streaming
> Expressions. These will be stored in memory as lists of Tuples containing the
> time series results.
> 3) Variables *c* and *d* are set using the *col* evaluator. The col evaluator
> extracts a column of numbers from a list of tuples. In the example *col* is
> extracting the count\(*\) field from the two time series result sets.
> 4) Variable *r* is the output from the *regress* evaluator. The regress
> evaluator performs a simple regression analysis on two columns of numbers.
> 5) Once the variables are set, a single Streaming Expression is run by the
> *let* expression. In the example the *tuple* expression is run. The tuple
> expression outputs a single Tuple with name/value pairs. Any Streaming
> Expression can be run by the *let* expression so this can be a complex
> program. The streaming expression run by *let* has access to all the
> variables defined earlier.
> 6) The tuple expression in the example has one name / value pair. The name
> *p* is set to the output of the *predict* evaluator. The predict evaluator is
> predicting the value of a dependent variable based on the independent
> variable 50. The regression result stored in variable *r* is used to make the
> prediction.
> 7) The output of this expression will be a single tuple with the value of the
> predict function in the *p* field.
> The growing list of issues linked to this ticket are the array manipulation
> and statistical functions that will form the basis of the stats library. The
> vast majority of these functions are backed by algorithms in Apache Commons
> Math. Other machine learning and math libraries will follow.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]