Till Rohrmann created FLINK-1537:
------------------------------------

             Summary: GSoC project: Machine learning with Apache Flink
                 Key: FLINK-1537
                 URL: https://issues.apache.org/jira/browse/FLINK-1537
             Project: Flink
          Issue Type: Improvement
            Reporter: Till Rohrmann


Currently, the Flink community is setting up the infrastructure for a machine 
learning library for Flink. The goal is to provide a set of highly optimized ML 
algorithms and to offer a high level linear algebra abstraction to easily do 
data pre- and post-processing. By defining a set of commonly used data 
structures on which the algorithms work it will be possible to define complex 
processing pipelines. 

The Mahout DSL constitutes a good fit to be used as the linear algebra language 
in Flink. It has to be evaluated which means have to be provided to allow an 
easy transition between the high level abstraction and the optimized algorithms.

The machine learning library offers multiple starting points for a GSoC 
project. Amongst others, the following projects are conceivable.

* Extension of Flink's machine learning library by additional ML algorithms
** Stochastic gradient descent
** Distributed dual coordinate ascent
** SVM
** Gaussian mixture EM
** DecisionTrees
** ...
* Integration of Flink with the Mahout DSL to support a high level linear 
algebra abstraction
* Integration of H2O with Flink to benefit from H2O's sophisticated machine 
learning algorithms
* Implementation of a parameter server like distributed global state storage 
facility for Flink. This also includes the extension of Flink to support 
asynchronous iterations and update messages.

Own ideas for a possible contribution on the field of the machine learning 
library are highly welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to