GitHub user thvasilo opened a pull request: https://github.com/apache/flink/pull/871
[FLINK-2157] [ml] [WIP] Create evaluation framework for ML library WIP PR for the model evaluation framework for FlinkML. The evaluation follow sklearn's paradigm, where a Scorer object is created with a performance score (sklearn's metrics), and provides an evaluate function that takes a trained model and a test dataset and produces a score. The performance scores and Scorer are implemented in the flink.ml.evaluation package. Currently we have squared loss, zero-one loss, accuracy score for classification and R^2 score for regression. Finally a score function has been added to regression algorithms (and will be added to classifiers as well) that provides an intuitive way to evaluate the performance of an algorithm without the need to create a Scorer, as per [FLINK-2108](https://issues.apache.org/jira/browse/FLINK-2108). The PR currently includes some work from Mikio Braun for a linear regression solver, but that will be moved to a separate PR. You can merge this pull request into a Git repository by running: $ git pull https://github.com/thvasilo/flink evaluation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/871.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #871 ---- commit ac373fb4af39d288c5b61bf1c86b1de5556748a6 Author: Till Rohrmann <trohrm...@apache.org> Date: 2015-06-02T12:34:27Z [FLINK-2116] [ml] Adds evaluate method to Predictor. Adds PredictOperation which can be reused by evaluate if the input data is of the format (TestingType, LabelType) where the second tuple field represents the true label. commit 7133cafb643d545fa5c66bedc7d5eda847faeb62 Author: mikiobraun <mikiobr...@gmail.com> Date: 2015-06-09T11:25:34Z First working version of a simpler least squares implementation Not done any work integrating that with the Flink Pipeline stuff commit f5315c0ce59b6a32c8aeb81ebba2a5982e981835 Author: mikiobraun <mikiobr...@gmail.com> Date: 2015-06-10T08:49:55Z reduce amount of toString computations for large collections commit 74aafa00e7e61003e081f9b54697ee9904487544 Author: mikiobraun <mikiobr...@gmail.com> Date: 2015-06-12T15:18:39Z simple lsr into pipeline commit f5c498ba1ba58a51f265f922fdce312518fcbf68 Author: mikiobraun <mikiobr...@gmail.com> Date: 2015-06-19T11:23:53Z working on the Simple LSR tests commit f37c41fc1d0b959c60c3e06f7d4633b57a7b87ac Author: mikiobraun <mikiobr...@gmail.com> Date: 2015-06-19T14:32:54Z slightly better checks in the SimpleLeastSquaresRegressionTest commit aae27c2f25792143febb900a11f4980ca1159aae Author: mikiobraun <mikiobr...@gmail.com> Date: 2015-06-22T15:04:42Z Adding some first loss functions for the evaluation framework commit 4d115f7db3e569655e2fb156f18ec897cd573089 Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-06-23T14:07:48Z Scorer for evaluation commit 1e7309d7ba2519e2520ed816456cfa2ca8e92510 Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-06-25T09:41:10Z Adds accuracy score and R^2 score. Also trying out Scores as classes instead of functions. Not too happy with the extra biolerplate of Score as classes will probably revert, and have objects like RegressionsScores, ClassificationScores that contain the definitions of the relevant scores. commit 3e275d567e2c4fe0b72875cfb54645dd346b4e22 Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-06-26T11:30:56Z Adds a evaluate operation for LabeledVector input commit 8c194be4a39170cb7f4865ae1dd39ebbeeddef7e Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-06-26T11:32:13Z Adds Regressor interface, and a score function for regression algorithms. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---