Re: [DISCUSS] Flink ML roadmap

Gábor Hermann Mon, 20 Feb 2017 03:49:03 -0800

Hi Stavros,

Thanks for bringing this up.

There have been past [1] and recent [2, 3] discussions about the Flinklibraries, because there are some stalling PRs and overloadedcommitters. (Actually, Till is the only committer shepherd of the boththe CEP and ML library, and AFAIK he has a ton of other responsibilitiesand work to do.) Thus it's hard to get code reviewed and merged, andwithout merged code it's hard to get a committer status, so there arenot many committers who can review e.g. ML algorithm implementations,and the cycle goes on. Until this is resolved somehow, we should helpthe committers by reviewing each-others PRs.

I think prioritizing features (b) is a good way to start. We coulddeclare most blocking features and concentrate on reviewing and mergingthem before moving forward. E.g. the evaluation framework is quiteimportant for an ML library in my opinion, and has a PR stalling forlong [4].

Regarding c), there are styleguides generally for contributing toFlink, so we should follow that. Is there something more ML specific youthink we could follow? We should definitely declare, we followscikit-learn and make sure contributions comply to that.

In terms of features (a, d), I think we should first see the biggerpicture. That is, it would be nice to discuss a clearer direction forFlink ML. I've seen a lot of interest in contributing to Flink MLlately. I believe we should rethink our goals, to put the contributionefforts in making a usable and useful library. Are we trying toimplement as many useful algorithms as possible to create a scalable MLlibrary? That would seem ambitious, and of course there are a lot offrameworks and libraries that already has something like this as goal(e.g. Spark MLlib, Mahout). Should we rather create connectors toexisting libraries? Then we cannot really do Flink specificoptimizations. Should we go for online machine learning (as Flink isconcentrating on streaming)? We already have a connector to SAMOA. Wecould go on with questions like this. Maybe I'm missing something, but Ihaven't seen such directions declared.


Cheers,
Gabor

[1]http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Opening-a-discussion-on-FlinkML-td10265.html[2]http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Flink-CEP-development-is-stalling-td15237.html#a15341[3]http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/New-Flink-team-member-Kate-Eri-td15349.html

[4] https://github.com/apache/flink/pull/1849

On 2017-02-20 11:43, Stavros Kontopoulos wrote:

(Resending with the appropriate topic)

Hi,

I would like to start a discussion about next steps for Flink ML.
Currently there is a lot of work going on but needs a push forward.

Some topics to discuss:

a) How several features should be planned and get aligned with Flink
releases.
b) Priorities of what should be done.
c) Basic guidelines for code: styleguides, scikit-learn compliance etc
d) Missing features important for the success of the library, next steps
etc...

Thoughts?

Best,
Stavros

Re: [DISCUSS] Flink ML roadmap

Reply via email to