[GitHub] flink pull request: [FLINK-2157] [ml] [WIP] Create evaluation fram...

thvasilo Fri, 26 Jun 2015 05:10:20 -0700

GitHub user thvasilo opened a pull request:

    https://github.com/apache/flink/pull/871


    [FLINK-2157] [ml] [WIP]  Create evaluation framework for ML library

    WIP PR for the model evaluation framework for FlinkML.
    
    The evaluation follow sklearn's paradigm, where a Scorer object is created 
with a performance score (sklearn's metrics), and provides an evaluate function 
that takes a trained model and a test dataset and produces a score.
    
    The performance scores and Scorer are implemented in the 
flink.ml.evaluation package.
    Currently we have squared loss, zero-one loss, accuracy score for 
classification and R^2 score for 
    regression.
    
    Finally a score function has been added to regression algorithms (and will 
be added to classifiers as well) that provides an intuitive way to evaluate the 
performance of an algorithm without the need to create a Scorer, as per 
[FLINK-2108](https://issues.apache.org/jira/browse/FLINK-2108).
    
    The PR currently includes some work from Mikio Braun for a linear 
regression solver, but that will be moved to a separate PR.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/thvasilo/flink evaluation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/871.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #871
    
----
commit ac373fb4af39d288c5b61bf1c86b1de5556748a6
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2015-06-02T12:34:27Z

    [FLINK-2116] [ml] Adds evaluate method to Predictor. Adds PredictOperation 
which can be reused by evaluate if the input data is of the format 
(TestingType, LabelType) where the second tuple field represents the true label.

commit 7133cafb643d545fa5c66bedc7d5eda847faeb62
Author: mikiobraun <mikiobr...@gmail.com>
Date:   2015-06-09T11:25:34Z

    First working version of a simpler least squares implementation
    
    Not done any work integrating that with the Flink Pipeline stuff

commit f5315c0ce59b6a32c8aeb81ebba2a5982e981835
Author: mikiobraun <mikiobr...@gmail.com>
Date:   2015-06-10T08:49:55Z

    reduce amount of toString computations for large collections

commit 74aafa00e7e61003e081f9b54697ee9904487544
Author: mikiobraun <mikiobr...@gmail.com>
Date:   2015-06-12T15:18:39Z

    simple lsr into pipeline

commit f5c498ba1ba58a51f265f922fdce312518fcbf68
Author: mikiobraun <mikiobr...@gmail.com>
Date:   2015-06-19T11:23:53Z

    working on the Simple LSR tests

commit f37c41fc1d0b959c60c3e06f7d4633b57a7b87ac
Author: mikiobraun <mikiobr...@gmail.com>
Date:   2015-06-19T14:32:54Z

    slightly better checks in the SimpleLeastSquaresRegressionTest

commit aae27c2f25792143febb900a11f4980ca1159aae
Author: mikiobraun <mikiobr...@gmail.com>
Date:   2015-06-22T15:04:42Z

    Adding some first loss functions for the evaluation framework

commit 4d115f7db3e569655e2fb156f18ec897cd573089
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-06-23T14:07:48Z

    Scorer for evaluation

commit 1e7309d7ba2519e2520ed816456cfa2ca8e92510
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-06-25T09:41:10Z

    Adds accuracy score and R^2 score. Also trying out Scores as classes 
instead of functions.
    
    Not too happy with the extra biolerplate of Score as classes will probably 
revert,
    and have objects like RegressionsScores, ClassificationScores that contain 
the definitions
    of the relevant scores.

commit 3e275d567e2c4fe0b72875cfb54645dd346b4e22
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-06-26T11:30:56Z

    Adds a evaluate operation for LabeledVector input

commit 8c194be4a39170cb7f4865ae1dd39ebbeeddef7e
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-06-26T11:32:13Z

    Adds Regressor interface, and a score function for regression algorithms.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2157] [ml] [WIP] Create evaluation fram...

Reply via email to