[ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951183#comment-15951183 ]
ASF GitHub Bot commented on FLINK-5785: --------------------------------------- GitHub user p4nna opened a pull request: https://github.com/apache/flink/pull/3659 [FLINK-5785] Add an Imputer for preparing data Adds an imputer class including tests which is able to impute values into sparse DataSets of Vectors. One can choose if the median, the mean or the most frequent value of a vector or row should be inserted You can merge this pull request into a Git repository by running: $ git pull https://github.com/p4nna/flink imputer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3659.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3659 ---- commit 88514a98642763c5ad962efecc44bef887b84110 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-30T08:00:33Z Added an imputer class with Strategy class The imputer imputes missing values into a sparse DataSet of Vectors with different strategies which can be chosen out of the existing ones in the strategy enum class (mean, median or most frequent value) in a row or column commit d17c6de2ad9456a58d24ac4cda44b5ef5ce5c216 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-30T08:01:47Z deleted class in false destination commit e4b336fdbf93084c30a8ee0067efcd7a4729c0e1 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-30T08:02:07Z deleted class in false destination commit ee6d57cfa669876f983cbf10eb6ffdd02b5c3052 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-30T08:04:04Z added imputer class with strategy class the imputer impustes values into a sparse DataSet of vectors with different strategies (mean, median or most frequent value as listed in the strategy class) commit 57524586cbd63e2f0dfdc70cb34df82e6451c3be Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-30T08:04:47Z added a test class for the new imputer class commit 72ebd5e210f583cd7e8df21ea8d73c06e835e198 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-30T08:08:49Z [FLINK-5785] Add an Imputer for preparing data, removed unnecessary things and comments, added license commit 31dbfc704247b0c4723d6d3091a16759fbe18041 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-30T08:09:26Z [FLINK-5785] Add an Imputer for preparing data added license commit d0f7b816bea49090633b4bc85762bbf70b192b27 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-30T08:10:03Z [FLINK-5785] Add an Imputer for preparing data added license commit 76f996e2ddc5d912c947f20e2109bd53973c8091 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-30T08:10:33Z [FLINK-5785] Add an Imputer for preparing data added license commit d533805c7b37888632238ce87e73e6ef9d081d02 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-31T15:54:37Z [FLINK-5785] Add an Imputer for preparing data should work now. commit 10dcdfab0ea27e6191cf6d0efad05a563f389ba4 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-31T15:56:04Z [FLINK-5785] Add an Imputer for preparing data was in wrong place commit 8e67f01ba1fb707b808473f4961902542aaca369 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-31T15:56:21Z [FLINK-5785] Add an Imputer for preparing data was in wrong place commit c3fdc87e0e9fc07785b4b4b8dc2b1fde4c756d35 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-31T15:56:59Z [FLINK-5785] Add an Imputer for preparing data should work now commit 07507b5ca0f1cfebc38f96bb8db32c10f2186bbf Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-31T15:57:37Z [FLINK-5785] Add an Imputer for preparing data tests should work now ---- > Add an Imputer for preparing data > --------------------------------- > > Key: FLINK-5785 > URL: https://issues.apache.org/jira/browse/FLINK-5785 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library > Reporter: Stavros Kontopoulos > Assignee: Stavros Kontopoulos > > We need to add an Imputer as described in [1]. > "The Imputer class provides basic strategies for imputing missing values, > either using the mean, the median or the most frequent value of the row or > column in which the missing values are located. This class also allows for > different missing values encodings." > References > 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing > 2. > http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py -- This message was sent by Atlassian JIRA (v6.3.15#6346)