Hi all!
Sorry for joining this discussion late (I have already missed some of the
deadlines set in this thread).
*Here are some thoughts about what we can do immediately*
(1) Grow ML community by adding committers with a dedicated. Irrespective
of any direction decision, this is a
must.
; actively developing them.
>
> Thanks,
>
> Soila
>
> From: Theodore Vasiloudis [mailto:theodoros.vasilou...@gmail.com]
> Sent: Friday, March 3, 2017 4:11 AM
> To: dev@flink.apache.org
> Cc: Kavulya, Soila P
> Subject: Re: [DISCUSS] Flink ML roadmap
>
> It seems lik
actively developing them.
Thanks,
Soila
From: Theodore Vasiloudis [mailto:theodoros.vasilou...@gmail.com]
Sent: Friday, March 3, 2017 4:11 AM
To: dev@flink.apache.org
Cc: Kavulya, Soila P
Subject: Re: [DISCUSS] Flink ML roadmap
It seems like a relatively new project, backed by Intel.
My impression
It seems like a relatively new project, backed by Intel.
My impression from the doc Roberto linked is that they might switch to
using Beam instead of Spark (?)
I'm cc'ing Soila who is developer of TAP and has worked on FlinkML in the
past, perhaps she has some input on how they plan to work with
Interesting thanx @Roberto. I see that only TAP Analytics Toolkit
supports streaming. I am not aware of its market share, anyone?
Best,
Stavros
On Fri, Mar 3, 2017 at 11:50 AM, Theodore Vasiloudis <
theodoros.vasilou...@gmail.com> wrote:
> Thank you for the links Roberto I did not know that Be
Thank you for the links Roberto I did not know that Beam was working on an
ML abstraction as well. I'm sure we can learn from that.
I'll start another thread today where we can discuss next steps and action
points now that we have a few different paths to follow listed on the
shared doc,
since our
Hi All,
First of all I'd like to introduce myself: my name is Roberto Bentivoglio
and I'm currently working for Radicalbit as Andrea Spina (he already wrote
on this thread).
I didn't have the chance to directly contribute on Flink up to now, but
some colleagues of mine are doing that since at leas
Hi Philipp,
It's great to hear you are interested in Flink ML!
Based on your description, your prototype seems like an interesting
approach for combining online+offline learning. If you're interested, we
might find a way to integrate your work, or at least your ideas, into
Flink ML if we deci
Hello all,
I’m new to this mailing list and I wanted to introduce myself. My name is
Philipp Zehnder and I’m a Masters Student in Computer Science at the Karlsruhe
Institute of Technology in Germany currently writing on my master’s thesis with
the main goal to integrate reusable machine learnin
@Theodore, thanks for taking lead in the coordination :)
Let's see what we can do, and then decide what should start out as an
independent project, or strictly inside Flink.
I agree that something experimental like batch ML on streaming would
probably benefit more an independent repo first.
O
Sure having a deadline for March 3rd is fine. I can act as coordinator,
trying to guide the discussion to concrete results.
For committers it's up to their discretion and time if one wants to
participate. I don't think it's necessary to have one, but it would be most
welcome.
@Katherin I would su
Okay, let's just aim for around the end of next week, but we can take
more time to discuss if there's still a lot of ongoing activity. Keep
the topic hot!
Thanks all for the enthusiasm :)
On 2017-02-23 16:17, Stavros Kontopoulos wrote:
@Gabor 3rd March is ok for me. But maybe giving a bit mo
@Gabor 3rd March is ok for me. But maybe giving a bit more time to it like
a week may suit more people.
What do you think all?
I will contribute to the doc.
+100 for having a co-ordinator + commiter.
Thank you all for joining the discussion.
Cheers,
Stavros
On Thu, Feb 23, 2017 at 4:48 PM, Gábo
Okay, I've created a skeleton of the design doc for choosing a direction:
https://docs.google.com/document/d/1afQbvZBTV15qF3vobVWUjxQc49h3Ud06MIRhahtJ6dw/edit?usp=sharing
Much of the pros/cons have already been discussed here, so I'll try to
put there all the arguments mentioned in this thread.
I have asked already some teams for useful cases, but all of them need time
to think.
During analysis something will finally arise.
May be we can ask partners of Flink for cases? Data Artisans got results
of customers survey: [1], ML better support is wanted, so we could ask what
exactly is necess
+100 for a design doc.
Could we also set a roadmap after some time-boxed investigation captured in
that document? We need action.
Looking forward to work on this (whatever that might be) ;) Also are there
any data supporting one direction or the other from a customer perspective?
It would help to
Yes, ok.
let's start some design document, and write down there already mentioned
ideas about: parameter server, about clipper and others. Would be nice if
we will also map this approaches to cases.
Will work on it collaboratively on each topic, may be finally we will form
some picture, that could
I agree, that it's better to go in one direction first, but I think
online and offline with streaming API can go somewhat parallel later. We
could set a short-term goal, concentrate initially on one direction, and
showcase that direction (e.g. in a blogpost). But first, we should list
the pros/
I'm not sure that this is feasible, doing all at the same time could mean
doing nothing
I'm just afraid, that words: we will work on streaming not on batching, we
have no commiter's time for this, mean that yes, we started work on
FLINK-1730, but nobody will commit this work in the end, as it a
@Theodore: Great to hear you think the "batch on streaming" approach is
possible! Of course, we need to pay attention all the pitfalls there, if
we go that way.
+1 for a design doc!
I would add that it's possible to make efforts in all the three
directions (i.e. batch, online, batch on stream
Hello all,
@Gabor, we have discussed the idea of using the streaming API to write all
of our ML algorithms with a couple of people offline,
and I think it might be possible and is generally worth a shot. The
approach we would take would be close to Vowpal Wabbit, not exactly
"online", but rather
It's great to see so much activity in this discussion :)
I'll try to add my thoughts.
I think building a developer community (Till's 2. point) can be slightly
separated from what features we should aim for (1. point) and showcasing
(3. point). Thanks Till for bringing up the ideas for restructu
Till, thank you for your response.
But I need several points to clarify:
1) Yes, batch and batch ML is the field full of alternatives, but in my
opinion that doesn’t mean that we should ignore the problem of not
developing batch part of Flink. You know: Apache Beam, Apache Mahout they
both feel th
Ok I see. Suppose we solve all the critical issues. And suppose we dont go
with the pure online model (although online ML has a potential)... should
we move on with the
current ML implementation which is for batch processing (to the best of my
knowledge)? The parameter server problem is a long stan
Thanks a lot for all your valuable input. It's great to see all your
interest in Flink and its ML library :-)
1) Direction of FlinkML
In order to reboot the FlinkML library we should indeed first decide on its
direction and come up with a roadmap to get the community behind.
Since we only have l
Thank you all for your thoughts on the matter.
Andrea brought up some further engine considerations that we need to
address in order to have a competitive ML engine on Flink.
I'm happy to see many people willing to contribute to the development of ML
on Flink. The way I see it, there needs to be
Hi all,
Thanks Stavros for pushing forward the discussion which I feel really
relevant.
Since I'm approaching actively the community just right now and I haven't
enough experience and such visibility around the Flink community, I'd limit
myself to share an opinion as a Flink user.
I'm using Flin
I think Flink ML could be a success. Many use cases out there could benefit
from such algorithms especially online ones.
I agree examples should be created showing how it could be used.
I was not aware of the project re-structuring issues. GPUs is really
important nowdays but it is still not the m
Hello guys,
My couple of cents.
All Flink presentations, articles, etc. articulate that Flink is for ETL,
data ingestion. CEP is a maximum.
If you visit http://flink.apache.org/usecases.html, you'll there aren't any
explicit ML or Graphs there.
It's also stated that Flink is suitable when "Data th
Hello guys,
May be we will be able to focus our forces on some E2E scenario or show
case for Flink as also ML supporting engine, and in such a way actualize
the roadmap?
This means: we can take some real life/production problem, like Fraud
detection in some area, and try to solve this problem f
Hello all,
thank you for opening this discussion Stavros, note that it's almost
exactly 1 year since I last opened such a topic (linked by Gabor) and the
comments there are still relevant.
I think Gabor described the current state quite well, development in the
libraries is hard without committer
Hi Stavros,
Thanks for bringing this up.
There have been past [1] and recent [2, 3] discussions about the Flink
libraries, because there are some stalling PRs and overloaded
committers. (Actually, Till is the only committer shepherd of the both
the CEP and ML library, and AFAIK he has a ton o
32 matches
Mail list logo