Re: [DISCUSS] Flink ML roadmap

2017-03-14 Thread Stephan Ewen
Hi all! Sorry for joining this discussion late (I have already missed some of the deadlines set in this thread). *Here are some thoughts about what we can do immediately* (1) Grow ML community by adding committers with a dedicated. Irrespective of any direction decision, this is a must.

Re: [DISCUSS] Flink ML roadmap

2017-03-10 Thread Till Rohrmann
; actively developing them. > > Thanks, > > Soila > > From: Theodore Vasiloudis [mailto:theodoros.vasilou...@gmail.com] > Sent: Friday, March 3, 2017 4:11 AM > To: dev@flink.apache.org > Cc: Kavulya, Soila P > Subject: Re: [DISCUSS] Flink ML roadmap > > It seems lik

RE: [DISCUSS] Flink ML roadmap

2017-03-08 Thread Kavulya, Soila P
actively developing them. Thanks, Soila From: Theodore Vasiloudis [mailto:theodoros.vasilou...@gmail.com] Sent: Friday, March 3, 2017 4:11 AM To: dev@flink.apache.org Cc: Kavulya, Soila P Subject: Re: [DISCUSS] Flink ML roadmap It seems like a relatively new project, backed by Intel. My impression

Re: [DISCUSS] Flink ML roadmap

2017-03-03 Thread Theodore Vasiloudis
It seems like a relatively new project, backed by Intel. My impression from the doc Roberto linked is that they might switch to using Beam instead of Spark (?) I'm cc'ing Soila who is developer of TAP and has worked on FlinkML in the past, perhaps she has some input on how they plan to work with

Re: [DISCUSS] Flink ML roadmap

2017-03-03 Thread Stavros Kontopoulos
Interesting thanx @Roberto. I see that only TAP Analytics Toolkit supports streaming. I am not aware of its market share, anyone? Best, Stavros On Fri, Mar 3, 2017 at 11:50 AM, Theodore Vasiloudis < theodoros.vasilou...@gmail.com> wrote: > Thank you for the links Roberto I did not know that Be

Re: [DISCUSS] Flink ML roadmap

2017-03-03 Thread Theodore Vasiloudis
Thank you for the links Roberto I did not know that Beam was working on an ML abstraction as well. I'm sure we can learn from that. I'll start another thread today where we can discuss next steps and action points now that we have a few different paths to follow listed on the shared doc, since our

Re: [DISCUSS] Flink ML roadmap

2017-03-02 Thread Roberto Bentivoglio
Hi All, First of all I'd like to introduce myself: my name is Roberto Bentivoglio and I'm currently working for Radicalbit as Andrea Spina (he already wrote on this thread). I didn't have the chance to directly contribute on Flink up to now, but some colleagues of mine are doing that since at leas

Re: [DISCUSS] Flink ML roadmap

2017-02-28 Thread Gábor Hermann
Hi Philipp, It's great to hear you are interested in Flink ML! Based on your description, your prototype seems like an interesting approach for combining online+offline learning. If you're interested, we might find a way to integrate your work, or at least your ideas, into Flink ML if we deci

Re: [DISCUSS] Flink ML roadmap

2017-02-27 Thread Philipp Zehnder
Hello all, I’m new to this mailing list and I wanted to introduce myself. My name is Philipp Zehnder and I’m a Masters Student in Computer Science at the Karlsruhe Institute of Technology in Germany currently writing on my master’s thesis with the main goal to integrate reusable machine learnin

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
@Theodore, thanks for taking lead in the coordination :) Let's see what we can do, and then decide what should start out as an independent project, or strictly inside Flink. I agree that something experimental like batch ML on streaming would probably benefit more an independent repo first. O

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Theodore Vasiloudis
Sure having a deadline for March 3rd is fine. I can act as coordinator, trying to guide the discussion to concrete results. For committers it's up to their discretion and time if one wants to participate. I don't think it's necessary to have one, but it would be most welcome. @Katherin I would su

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
Okay, let's just aim for around the end of next week, but we can take more time to discuss if there's still a lot of ongoing activity. Keep the topic hot! Thanks all for the enthusiasm :) On 2017-02-23 16:17, Stavros Kontopoulos wrote: @Gabor 3rd March is ok for me. But maybe giving a bit mo

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Stavros Kontopoulos
@Gabor 3rd March is ok for me. But maybe giving a bit more time to it like a week may suit more people. What do you think all? I will contribute to the doc. +100 for having a co-ordinator + commiter. Thank you all for joining the discussion. Cheers, Stavros On Thu, Feb 23, 2017 at 4:48 PM, Gábo

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
Okay, I've created a skeleton of the design doc for choosing a direction: https://docs.google.com/document/d/1afQbvZBTV15qF3vobVWUjxQc49h3Ud06MIRhahtJ6dw/edit?usp=sharing Much of the pros/cons have already been discussed here, so I'll try to put there all the arguments mentioned in this thread.

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Katherin Eri
I have asked already some teams for useful cases, but all of them need time to think. During analysis something will finally arise. May be we can ask partners of Flink for cases? Data Artisans got results of customers survey: [1], ML better support is wanted, so we could ask what exactly is necess

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Stavros Kontopoulos
+100 for a design doc. Could we also set a roadmap after some time-boxed investigation captured in that document? We need action. Looking forward to work on this (whatever that might be) ;) Also are there any data supporting one direction or the other from a customer perspective? It would help to

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Katherin Eri
Yes, ok. let's start some design document, and write down there already mentioned ideas about: parameter server, about clipper and others. Would be nice if we will also map this approaches to cases. Will work on it collaboratively on each topic, may be finally we will form some picture, that could

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
I agree, that it's better to go in one direction first, but I think online and offline with streaming API can go somewhat parallel later. We could set a short-term goal, concentrate initially on one direction, and showcase that direction (e.g. in a blogpost). But first, we should list the pros/

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Katherin Eri
I'm not sure that this is feasible, doing all at the same time could mean doing nothing I'm just afraid, that words: we will work on streaming not on batching, we have no commiter's time for this, mean that yes, we started work on FLINK-1730, but nobody will commit this work in the end, as it a

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Gábor Hermann
@Theodore: Great to hear you think the "batch on streaming" approach is possible! Of course, we need to pay attention all the pitfalls there, if we go that way. +1 for a design doc! I would add that it's possible to make efforts in all the three directions (i.e. batch, online, batch on stream

Re: [DISCUSS] Flink ML roadmap

2017-02-23 Thread Theodore Vasiloudis
Hello all, @Gabor, we have discussed the idea of using the streaming API to write all of our ML algorithms with a couple of people offline, and I think it might be possible and is generally worth a shot. The approach we would take would be close to Vowpal Wabbit, not exactly "online", but rather

Re: [DISCUSS] Flink ML roadmap

2017-02-21 Thread Gábor Hermann
It's great to see so much activity in this discussion :) I'll try to add my thoughts. I think building a developer community (Till's 2. point) can be slightly separated from what features we should aim for (1. point) and showcasing (3. point). Thanks Till for bringing up the ideas for restructu

Re: [DISCUSS] Flink ML roadmap

2017-02-21 Thread Katherin Eri
Till, thank you for your response. But I need several points to clarify: 1) Yes, batch and batch ML is the field full of alternatives, but in my opinion that doesn’t mean that we should ignore the problem of not developing batch part of Flink. You know: Apache Beam, Apache Mahout they both feel th

Re: [DISCUSS] Flink ML roadmap

2017-02-21 Thread Stavros Kontopoulos
Ok I see. Suppose we solve all the critical issues. And suppose we dont go with the pure online model (although online ML has a potential)... should we move on with the current ML implementation which is for batch processing (to the best of my knowledge)? The parameter server problem is a long stan

Re: [DISCUSS] Flink ML roadmap

2017-02-21 Thread Till Rohrmann
Thanks a lot for all your valuable input. It's great to see all your interest in Flink and its ML library :-) 1) Direction of FlinkML In order to reboot the FlinkML library we should indeed first decide on its direction and come up with a roadmap to get the community behind. Since we only have l

Re: [DISCUSS] Flink ML roadmap

2017-02-21 Thread Theodore Vasiloudis
Thank you all for your thoughts on the matter. Andrea brought up some further engine considerations that we need to address in order to have a competitive ML engine on Flink. I'm happy to see many people willing to contribute to the development of ML on Flink. The way I see it, there needs to be

Re: [DISCUSS] Flink ML roadmap

2017-02-20 Thread Andrea Spina
Hi all, Thanks Stavros for pushing forward the discussion which I feel really relevant. Since I'm approaching actively the community just right now and I haven't enough experience and such visibility around the Flink community, I'd limit myself to share an opinion as a Flink user. I'm using Flin

Re: [DISCUSS] Flink ML roadmap

2017-02-20 Thread Stavros Kontopoulos
I think Flink ML could be a success. Many use cases out there could benefit from such algorithms especially online ones. I agree examples should be created showing how it could be used. I was not aware of the project re-structuring issues. GPUs is really important nowdays but it is still not the m

Re: [DISCUSS] Flink ML roadmap

2017-02-20 Thread Timur Shenkao
Hello guys, My couple of cents. All Flink presentations, articles, etc. articulate that Flink is for ETL, data ingestion. CEP is a maximum. If you visit http://flink.apache.org/usecases.html, you'll there aren't any explicit ML or Graphs there. It's also stated that Flink is suitable when "Data th

Re: [DISCUSS] Flink ML roadmap

2017-02-20 Thread Katherin Eri
Hello guys, May be we will be able to focus our forces on some E2E scenario or show case for Flink as also ML supporting engine, and in such a way actualize the roadmap? This means: we can take some real life/production problem, like Fraud detection in some area, and try to solve this problem f

Re: [DISCUSS] Flink ML roadmap

2017-02-20 Thread Theodore Vasiloudis
Hello all, thank you for opening this discussion Stavros, note that it's almost exactly 1 year since I last opened such a topic (linked by Gabor) and the comments there are still relevant. I think Gabor described the current state quite well, development in the libraries is hard without committer

Re: [DISCUSS] Flink ML roadmap

2017-02-20 Thread Gábor Hermann
Hi Stavros, Thanks for bringing this up. There have been past [1] and recent [2, 3] discussions about the Flink libraries, because there are some stalling PRs and overloaded committers. (Actually, Till is the only committer shepherd of the both the CEP and ML library, and AFAIK he has a ton o