[ 
https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951183#comment-15951183
 ] 

ASF GitHub Bot commented on FLINK-5785:
---------------------------------------

GitHub user p4nna opened a pull request:

    https://github.com/apache/flink/pull/3659

    [FLINK-5785] Add an Imputer for preparing data

    Adds an imputer class including tests which is able to impute values into 
sparse DataSets of Vectors. One can choose if the median, the mean or the most 
frequent value of a vector or row should be inserted

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/p4nna/flink imputer

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3659.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3659
    
----
commit 88514a98642763c5ad962efecc44bef887b84110
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-30T08:00:33Z

    Added an imputer class with Strategy class
    
    The imputer imputes missing values into a sparse DataSet of Vectors with 
different strategies which can be chosen out of the existing ones in the 
strategy enum class (mean, median or most frequent value) in a row or column

commit d17c6de2ad9456a58d24ac4cda44b5ef5ce5c216
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-30T08:01:47Z

    deleted class in false destination

commit e4b336fdbf93084c30a8ee0067efcd7a4729c0e1
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-30T08:02:07Z

    deleted class in false destination

commit ee6d57cfa669876f983cbf10eb6ffdd02b5c3052
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-30T08:04:04Z

    added imputer class with strategy class
    
    the imputer impustes values into a sparse DataSet of vectors with different 
strategies (mean, median or most frequent value as listed in the strategy class)

commit 57524586cbd63e2f0dfdc70cb34df82e6451c3be
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-30T08:04:47Z

    added a test class for the new imputer class

commit 72ebd5e210f583cd7e8df21ea8d73c06e835e198
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-30T08:08:49Z

    [FLINK-5785] Add an Imputer for preparing data, 
    
    removed unnecessary things and comments, added license

commit 31dbfc704247b0c4723d6d3091a16759fbe18041
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-30T08:09:26Z

    [FLINK-5785] Add an Imputer for preparing data
    
    added license

commit d0f7b816bea49090633b4bc85762bbf70b192b27
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-30T08:10:03Z

    [FLINK-5785] Add an Imputer for preparing data
    
    added license

commit 76f996e2ddc5d912c947f20e2109bd53973c8091
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-30T08:10:33Z

    [FLINK-5785] Add an Imputer for preparing data
    
    added license

commit d533805c7b37888632238ce87e73e6ef9d081d02
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-31T15:54:37Z

    [FLINK-5785] Add an Imputer for preparing data 
    
    should work now.

commit 10dcdfab0ea27e6191cf6d0efad05a563f389ba4
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-31T15:56:04Z

    [FLINK-5785] Add an Imputer for preparing data
    
    was in wrong place

commit 8e67f01ba1fb707b808473f4961902542aaca369
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-31T15:56:21Z

    [FLINK-5785] Add an Imputer for preparing data
    
    was in wrong place

commit c3fdc87e0e9fc07785b4b4b8dc2b1fde4c756d35
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-31T15:56:59Z

    [FLINK-5785] Add an Imputer for preparing data
    
    should work now

commit 07507b5ca0f1cfebc38f96bb8db32c10f2186bbf
Author: p4nna <b...@dbs.ifi.lmu.de>
Date:   2017-03-31T15:57:37Z

    [FLINK-5785] Add an Imputer for preparing data
    
    tests should work now

----


> Add an Imputer for preparing data
> ---------------------------------
>
>                 Key: FLINK-5785
>                 URL: https://issues.apache.org/jira/browse/FLINK-5785
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Stavros Kontopoulos
>            Assignee: Stavros Kontopoulos
>
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, 
> either using the mean, the median or the most frequent value of the row or 
> column in which the missing values are located. This class also allows for 
> different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. 
> http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to