GitHub user p4nna opened a pull request:

    https://github.com/apache/flink/pull/3631

    Imputer 

    Provides an imputer method which adds missing values to a sparse DataSet of 
vectors. Those can be filled with the mean, the median or the most frequent 
value of each row or optionally column. Like that incomplete data don't have to 
be thrown away, but rather can be used to train a machine learning algorithm
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/p4nna/flink imputer

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3631
    
----
commit f2875ac5890564213d5f055d710976d1fede3962
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-27T09:47:39Z

    Add files via upload

commit 8e6909b52dad34d6c4cd6c84618616ac50cd83d1
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-27T09:49:59Z

    Test for Imputer class
    
    Two testclasses which test the functions implemented in the new imputer 
class. One for the rowwise imputing over all vectors and one for the vectorwise 
imputing

commit 0c420a84c136b330135ce180db04d899b5a6f54c
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-27T09:56:51Z

    removed unused imports and methods

commit 9136607e84a0297bb4fb24a53bad9950b86bf116
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-27T15:58:37Z

    Imputer was added
    
    adds missing values in sparse DataSets of Vectors

commit ee2e753780aa7bce386fbae17ca684c2f0d94eef
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-28T07:29:56Z

    added apache license

commit b6d52fc67937b9ff21911306be015a979a12d8e7
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-28T07:30:19Z

    added apache license

commit 986617fbe937d9103e8336dedbdd6c5f7922e78d
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-28T07:30:48Z

    added apache license

commit 6aac7182e8654d5fba7ed2362a42b803a0820d17
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-28T07:31:12Z

    added apache license

commit 40ea05f33e12fa3a61a01d938a10c6bd736e2d5c
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-28T11:19:09Z

    deleted unnecessary file

commit 36dc30c054335c1c88a0fb1f73e52591c0d24061
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-28T11:19:26Z

    deleted unnecessary file

commit e2a6f542828b4f613b3e1db63c7ab64475974b99
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-28T11:22:15Z

    deleted unnecessary file

commit caea8b12098f2094c79043c220ade9192f95bd80
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-28T11:35:07Z

    Update MLUtils.scala

commit 0f1ca00bf17ca5c61281e5a4473c201567d84262
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-28T11:50:30Z

    Imputer for sparse DataSets of Vectors
    
    Imputes missing values with the mean, median or most frequent value of a 
row or column

commit 1db58b3be6d8da4ca7e07ac65574d92c1c65482b
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-28T11:51:05Z

    Tests for new imputer class

commit 6b78f14af38722153eabea4d1721d88c99101f40
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-28T11:53:09Z

    false destination

commit c5eeb96ebe7463b09177812baf4a39c45a7ff7b1
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-28T11:53:27Z

    false destination

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to