Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Rong Rong Mon, 06 May 2019 17:49:09 -0700

Thanks for following up promptly and sharing the feedback @shaoxuan.

Yes I share the same view with you on the convergence of these 2 FLIPs
eventually. I also have some questions regarding the API as well as the
possible convergence challenges (especially current Co-processor approach
vs. FLIP-39's table API approach), I will follow up on the discussion
thread and the PR on FLIP-23 with you and Boris :-)


--
Rong

On Mon, May 6, 2019 at 3:30 AM Shaoxuan Wang <wshaox...@gmail.com> wrote:

>
> Thanks for the feedback, Rong and Flavio.
>
> @Rong Rong
> > There's another thread regarding a close to merge FLIP-23 implementation
> > [1]. I agree this might still be early stage to talk about
> productionizing
> > and model-serving. But I would be nice to keep the design/implementation
> in
> > mind that: ease of use for productionizing a ML pipeline is also very
> > important.
> > And if we can leverage the implementation in FLIP-23 in the future, (some
> > adjustment might be needed) that would be super helpful.
> Your raised a very good point. Actually I have been reviewing FLIP23 for a
> while (mostly offline to help Boris polish the PR). FMPOV, FLIP23 and
> FLIP39 can be well unified at some point. Model serving in FLIP23 is
> actually a special case of “transformer/model” proposed in FLIP39. Boris's
> implementation of model serving can be designed as an abstract class on top
> of transformer/model interface, and then can be used by ML users as a
> certain ML lib.  I have some other comments WRT FLIP23 x FLIP39, I will
> reply to the FLIP23 ML later with more details.
>
> @Flavio
> > I have read many discussion about Flink ML and none of them take into
> > account the ongoing efforts carried out of by the Streamline H2020
> project
> > [1] on this topic.
> > Have you tried to ping them? I think that both projects could benefits
> from
> > a joined effort on this side..
> > [1] https://h2020-streamline-project.eu/objectives/
> Thank you for your info. I am not aware of the Streamline H2020 projects
> before. Just did a quick look at its website and github. IMO these projects
> could be very good Flink ecosystem projects and can be built on top of ML
> pipeline & ML lib interfaces introduced in FLIP39. I will try to contact
> the owners of these projects to understand their plans and blockers of
> using Flink (if there is any). In the meantime, if you have the direct
> contact of person who might be interested on ML pipeline & ML lib, please
> share with me.
>
> Regards,
> Shaoxuan
>
>
>
>
>
> On Thu, May 2, 2019 at 3:59 PM Flavio Pompermaier <pomperma...@okkam.it>
> wrote:
>
>> Hi to all,
>> I have read many discussion about Flink ML and none of them take into
>> account the ongoing efforts carried out of by the Streamline H2020 project
>> [1] on this topic.
>> Have you tried to ping them? I think that both projects could benefits
>> from
>> a joined effort on this side..
>> [1] https://h2020-streamline-project.eu/objectives/
>>
>> Best,
>> Flavio
>>
>> On Thu, May 2, 2019 at 12:18 AM Rong Rong <walter...@gmail.com> wrote:
>>
>> > Hi Shaoxuan/Weihua,
>> >
>> > Thanks for the proposal and driving the effort.
>> > I also replied to the original discussion thread, and still a +1 on
>> moving
>> > towards the ski-learn model.
>> > I just left a few comments on the API details and some general
>> questions.
>> > Please kindly take a look.
>> >
>> > There's another thread regarding a close to merge FLIP-23 implementation
>> > [1]. I agree this might still be early stage to talk about
>> productionizing
>> > and model-serving. But I would be nice to keep the
>> design/implementation in
>> > mind that: ease of use for productionizing a ML pipeline is also very
>> > important.
>> > And if we can leverage the implementation in FLIP-23 in the future,
>> (some
>> > adjustment might be needed) that would be super helpful.
>> >
>> > Best,
>> > Rong
>> >
>> >
>> > [1]
>> >
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-23-Model-Serving-td20260.html
>> >
>> >
>> > On Tue, Apr 30, 2019 at 1:47 AM Shaoxuan Wang <wshaox...@gmail.com>
>> wrote:
>> >
>> > > Thanks for all the feedback.
>> > >
>> > > @Jincheng Sun
>> > > > I recommend It's better to add a detailed implementation plan to
>> FLIP
>> > and
>> > > google doc.
>> > > Yes, I will add a subsection for implementation plan.
>> > >
>> > > @Chen Qin
>> > > >Just share some of insights from operating SparkML side at scale
>> > > >- map reduce may not best way to iterative sync partitioned workers.
>> > > >- native hardware accelerations is key to adopt rapid changes in ML
>> > > improvements in foreseeable future.
>> > > Thanks for sharing your experience on SparkML. The purpose of this
>> FLIP
>> > is
>> > > mainly to provide the interfaces for ML pipeline and ML lib, and the
>> > > implementations of most standard algorithms. Besides this FLIP, for AI
>> > > computing on Flink, we will continue to contribute the efforts, like
>> the
>> > > enhancement of iterative and the integration of deep learning engines
>> > (such
>> > > as Tensoflow/Pytorch). I have presented part of these work in
>> > >
>> > >
>> >
>> https://www.ververica.com/resources/flink-forward-san-francisco-2019/when-table-meets-ai-build-flink-ai-ecosystem-on-table-api
>> > > I am not sure if I have fully got your comments. Can you please
>> elaborate
>> > > them with more details, and if possible, please provide some
>> suggestions
>> > > about what we should work on to address the challenges you have
>> > mentioned.
>> > >
>> > > Regards,
>> > > Shaoxuan
>> > >
>> > > On Mon, Apr 29, 2019 at 11:28 AM Chen Qin <qinnc...@gmail.com> wrote:
>> > >
>> > > > Just share some of insights from operating SparkML side at scale
>> > > > - map reduce may not best way to iterative sync partitioned workers.
>> > > > - native hardware accelerations is key to adopt rapid changes in ML
>> > > > improvements in foreseeable future.
>> > > >
>> > > > Chen
>> > > >
>> > > > On Apr 29, 2019, at 11:02, jincheng sun <sunjincheng...@gmail.com>
>> > > wrote:
>> > > > >
>> > > > > Hi Shaoxuan,
>> > > > >
>> > > > > Thanks for doing more efforts for the enhances of the scalability
>> and
>> > > the
>> > > > > ease of use of Flink ML and make it one step further. Thank you
>> for
>> > > > sharing
>> > > > > a lot of context information.
>> > > > >
>> > > > > big +1 for this proposal!
>> > > > >
>> > > > > Here only one suggestion, that is, It has been a short time until
>> the
>> > > > > release of flink-1.9, so I recommend It's better to add a detailed
>> > > > > implementation plan to FLIP and google doc.
>> > > > >
>> > > > > What do you think?
>> > > > >
>> > > > > Best,
>> > > > > Jincheng
>> > > > >
>> > > > > Shaoxuan Wang <wshaox...@gmail.com> 于2019年4月29日周一 上午10:34写道：
>> > > > >
>> > > > >> Hi everyone,
>> > > > >>
>> > > > >> Weihua has proposed to rebuild Flink ML pipeline on top of
>> TableAPI
>> > > > several
>> > > > >> months ago in this mail thread:
>> > > > >>
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Embracing-Table-API-in-Flink-ML-td25368.html
>> > > > >>
>> > > > >> Luogen, Becket, Xu, Weihua and I have been working on this
>> proposal
>> > > > >> offline in
>> > > > >> the past a few months. Now we want to share the first phase of
>> the
>> > > > entire
>> > > > >> proposal with a FLIP. In this FLIP-39, we want to achieve several
>> > > things
>> > > > >> (and hope those can be accomplished and released in Flink-1.9):
>> > > > >>
>> > > > >>   -
>> > > > >>
>> > > > >>   Provide a new set of ML core interface (on top of Flink
>> TableAPI)
>> > > > >>   -
>> > > > >>
>> > > > >>   Provide a ML pipeline interface (on top of Flink TableAPI)
>> > > > >>   -
>> > > > >>
>> > > > >>   Provide the interfaces for parameters management and
>> pipeline/mode
>> > > > >>   persistence
>> > > > >>   -
>> > > > >>
>> > > > >>   All the above interfaces should facilitate any new ML
>> algorithm.
>> > We
>> > > > will
>> > > > >>   gradually add various standard ML algorithms on top of these
>> new
>> > > > >> proposed
>> > > > >>   interfaces to ensure their feasibility and scalability.
>> > > > >>
>> > > > >>
>> > > > >> Part of this FLIP has been present in Flink Forward 2019 @ San
>> > > > Francisco by
>> > > > >> Xu and Me.
>> > > > >>
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>> https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api
>> > > > >>
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>> https://sf-2019.flink-forward.org/conference-program#high-performance-ml-library-based-on-flink
>> > > > >>
>> > > > >> You can find the videos & slides at
>> > > > >> https://www.ververica.com/flink-forward-san-francisco-2019
>> > > > >>
>> > > > >> The design document for FLIP-39 can be found here:
>> > > > >>
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo
>> > > > >>
>> > > > >>
>> > > > >> I am looking forward to your feedback.
>> > > > >>
>> > > > >> Regards,
>> > > > >>
>> > > > >> Shaoxuan
>> > > > >>
>> > > >
>> > >
>> >
>>
>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Reply via email to