Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Shaoxuan Wang Mon, 06 May 2019 03:31:16 -0700

Thanks for the feedback, Rong and Flavio.

@Rong Rong
> There's another thread regarding a close to merge FLIP-23 implementation
> [1]. I agree this might still be early stage to talk about productionizing
> and model-serving. But I would be nice to keep the design/implementation
in
> mind that: ease of use for productionizing a ML pipeline is also very
> important.
> And if we can leverage the implementation in FLIP-23 in the future, (some
> adjustment might be needed) that would be super helpful.
Your raised a very good point. Actually I have been reviewing FLIP23 for a
while (mostly offline to help Boris polish the PR). FMPOV, FLIP23 and
FLIP39 can be well unified at some point. Model serving in FLIP23 is
actually a special case of “transformer/model” proposed in FLIP39. Boris's
implementation of model serving can be designed as an abstract class on top
of transformer/model interface, and then can be used by ML users as a
certain ML lib.  I have some other comments WRT FLIP23 x FLIP39, I will
reply to the FLIP23 ML later with more details.


@Flavio
> I have read many discussion about Flink ML and none of them take into
> account the ongoing efforts carried out of by the Streamline H2020 project
> [1] on this topic.
> Have you tried to ping them? I think that both projects could benefits
from
> a joined effort on this side..
> [1] https://h2020-streamline-project.eu/objectives/
Thank you for your info. I am not aware of the Streamline H2020 projects
before. Just did a quick look at its website and github. IMO these projects
could be very good Flink ecosystem projects and can be built on top of ML
pipeline & ML lib interfaces introduced in FLIP39. I will try to contact
the owners of these projects to understand their plans and blockers of
using Flink (if there is any). In the meantime, if you have the direct
contact of person who might be interested on ML pipeline & ML lib, please
share with me.

Regards,
Shaoxuan





On Thu, May 2, 2019 at 3:59 PM Flavio Pompermaier <[email protected]>
wrote:

> Hi to all,
> I have read many discussion about Flink ML and none of them take into
> account the ongoing efforts carried out of by the Streamline H2020 project
> [1] on this topic.
> Have you tried to ping them? I think that both projects could benefits from
> a joined effort on this side..
> [1] https://h2020-streamline-project.eu/objectives/
>
> Best,
> Flavio
>
> On Thu, May 2, 2019 at 12:18 AM Rong Rong <[email protected]> wrote:
>
> > Hi Shaoxuan/Weihua,
> >
> > Thanks for the proposal and driving the effort.
> > I also replied to the original discussion thread, and still a +1 on
> moving
> > towards the ski-learn model.
> > I just left a few comments on the API details and some general questions.
> > Please kindly take a look.
> >
> > There's another thread regarding a close to merge FLIP-23 implementation
> > [1]. I agree this might still be early stage to talk about
> productionizing
> > and model-serving. But I would be nice to keep the design/implementation
> in
> > mind that: ease of use for productionizing a ML pipeline is also very
> > important.
> > And if we can leverage the implementation in FLIP-23 in the future, (some
> > adjustment might be needed) that would be super helpful.
> >
> > Best,
> > Rong
> >
> >
> > [1]
> >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-23-Model-Serving-td20260.html
> >
> >
> > On Tue, Apr 30, 2019 at 1:47 AM Shaoxuan Wang <[email protected]>
> wrote:
> >
> > > Thanks for all the feedback.
> > >
> > > @Jincheng Sun
> > > > I recommend It's better to add a detailed implementation plan to FLIP
> > and
> > > google doc.
> > > Yes, I will add a subsection for implementation plan.
> > >
> > > @Chen Qin
> > > >Just share some of insights from operating SparkML side at scale
> > > >- map reduce may not best way to iterative sync partitioned workers.
> > > >- native hardware accelerations is key to adopt rapid changes in ML
> > > improvements in foreseeable future.
> > > Thanks for sharing your experience on SparkML. The purpose of this FLIP
> > is
> > > mainly to provide the interfaces for ML pipeline and ML lib, and the
> > > implementations of most standard algorithms. Besides this FLIP, for AI
> > > computing on Flink, we will continue to contribute the efforts, like
> the
> > > enhancement of iterative and the integration of deep learning engines
> > (such
> > > as Tensoflow/Pytorch). I have presented part of these work in
> > >
> > >
> >
> https://www.ververica.com/resources/flink-forward-san-francisco-2019/when-table-meets-ai-build-flink-ai-ecosystem-on-table-api
> > > I am not sure if I have fully got your comments. Can you please
> elaborate
> > > them with more details, and if possible, please provide some
> suggestions
> > > about what we should work on to address the challenges you have
> > mentioned.
> > >
> > > Regards,
> > > Shaoxuan
> > >
> > > On Mon, Apr 29, 2019 at 11:28 AM Chen Qin <[email protected]> wrote:
> > >
> > > > Just share some of insights from operating SparkML side at scale
> > > > - map reduce may not best way to iterative sync partitioned workers.
> > > > - native hardware accelerations is key to adopt rapid changes in ML
> > > > improvements in foreseeable future.
> > > >
> > > > Chen
> > > >
> > > > On Apr 29, 2019, at 11:02, jincheng sun <[email protected]>
> > > wrote:
> > > > >
> > > > > Hi Shaoxuan,
> > > > >
> > > > > Thanks for doing more efforts for the enhances of the scalability
> and
> > > the
> > > > > ease of use of Flink ML and make it one step further. Thank you for
> > > > sharing
> > > > > a lot of context information.
> > > > >
> > > > > big +1 for this proposal!
> > > > >
> > > > > Here only one suggestion, that is, It has been a short time until
> the
> > > > > release of flink-1.9, so I recommend It's better to add a detailed
> > > > > implementation plan to FLIP and google doc.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Best,
> > > > > Jincheng
> > > > >
> > > > > Shaoxuan Wang <[email protected]> 于2019年4月29日周一 上午10:34写道：
> > > > >
> > > > >> Hi everyone,
> > > > >>
> > > > >> Weihua has proposed to rebuild Flink ML pipeline on top of
> TableAPI
> > > > several
> > > > >> months ago in this mail thread:
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
> > > > >>
> > > > >> Luogen, Becket, Xu, Weihua and I have been working on this
> proposal
> > > > >> offline in
> > > > >> the past a few months. Now we want to share the first phase of the
> > > > entire
> > > > >> proposal with a FLIP. In this FLIP-39, we want to achieve several
> > > things
> > > > >> (and hope those can be accomplished and released in Flink-1.9):
> > > > >>
> > > > >>   -
> > > > >>
> > > > >>   Provide a new set of ML core interface (on top of Flink
> TableAPI)
> > > > >>   -
> > > > >>
> > > > >>   Provide a ML pipeline interface (on top of Flink TableAPI)
> > > > >>   -
> > > > >>
> > > > >>   Provide the interfaces for parameters management and
> pipeline/mode
> > > > >>   persistence
> > > > >>   -
> > > > >>
> > > > >>   All the above interfaces should facilitate any new ML algorithm.
> > We
> > > > will
> > > > >>   gradually add various standard ML algorithms on top of these new
> > > > >> proposed
> > > > >>   interfaces to ensure their feasibility and scalability.
> > > > >>
> > > > >>
> > > > >> Part of this FLIP has been present in Flink Forward 2019 @ San
> > > > Francisco by
> > > > >> Xu and Me.
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink
> > > > >>
> > > > >> You can find the videos & slides at
> > > > >> https://www.ververica.com/flink-forward-san-francisco-2019
> > > > >>
> > > > >> The design document for FLIP-39 can be found here:
> > > > >>
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo
> > > > >>
> > > > >>
> > > > >> I am looking forward to your feedback.
> > > > >>
> > > > >> Regards,
> > > > >>
> > > > >> Shaoxuan
> > > > >>
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Reply via email to