GitHub user p4nna opened a pull request: https://github.com/apache/flink/pull/3631
Imputer Provides an imputer method which adds missing values to a sparse DataSet of vectors. Those can be filled with the mean, the median or the most frequent value of each row or optionally column. Like that incomplete data don't have to be thrown away, but rather can be used to train a machine learning algorithm You can merge this pull request into a Git repository by running: $ git pull https://github.com/p4nna/flink imputer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3631.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3631 ---- commit f2875ac5890564213d5f055d710976d1fede3962 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-27T09:47:39Z Add files via upload commit 8e6909b52dad34d6c4cd6c84618616ac50cd83d1 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-27T09:49:59Z Test for Imputer class Two testclasses which test the functions implemented in the new imputer class. One for the rowwise imputing over all vectors and one for the vectorwise imputing commit 0c420a84c136b330135ce180db04d899b5a6f54c Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-27T09:56:51Z removed unused imports and methods commit 9136607e84a0297bb4fb24a53bad9950b86bf116 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-27T15:58:37Z Imputer was added adds missing values in sparse DataSets of Vectors commit ee2e753780aa7bce386fbae17ca684c2f0d94eef Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-28T07:29:56Z added apache license commit b6d52fc67937b9ff21911306be015a979a12d8e7 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-28T07:30:19Z added apache license commit 986617fbe937d9103e8336dedbdd6c5f7922e78d Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-28T07:30:48Z added apache license commit 6aac7182e8654d5fba7ed2362a42b803a0820d17 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-28T07:31:12Z added apache license commit 40ea05f33e12fa3a61a01d938a10c6bd736e2d5c Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-28T11:19:09Z deleted unnecessary file commit 36dc30c054335c1c88a0fb1f73e52591c0d24061 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-28T11:19:26Z deleted unnecessary file commit e2a6f542828b4f613b3e1db63c7ab64475974b99 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-28T11:22:15Z deleted unnecessary file commit caea8b12098f2094c79043c220ade9192f95bd80 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-28T11:35:07Z Update MLUtils.scala commit 0f1ca00bf17ca5c61281e5a4473c201567d84262 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-28T11:50:30Z Imputer for sparse DataSets of Vectors Imputes missing values with the mean, median or most frequent value of a row or column commit 1db58b3be6d8da4ca7e07ac65574d92c1c65482b Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-28T11:51:05Z Tests for new imputer class commit 6b78f14af38722153eabea4d1721d88c99101f40 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-28T11:53:09Z false destination commit c5eeb96ebe7463b09177812baf4a39c45a7ff7b1 Author: p4nna <b...@dbs.ifi.lmu.de> Date: 2017-03-28T11:53:27Z false destination ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---