Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Gen Luo Mon, 17 Jun 2019 03:27:16 -0700

Hi all,

In the review of PR for FLINK-12473, there were a few comments regarding
pipeline exportation. We would like to start a follow up discussions to
address some related comments.


Currently, FLIP-39 proposal gives a way for users to persist a pipeline in
JSON format. But it does not specify how users can export a pipeline for
serving purpose. We summarized some thoughts on this in the following doc.

https://docs.google.com/document/d/1B84b-1CvOXtwWQ6_tQyiaHwnSeiRqh-V96Or8uHqCp8/edit?usp=sharing

After we reach consensus on the pipeline exportation, we will add a
corresponding section in FLIP-39.


Shaoxuan Wang <[email protected]> 于2019年6月5日周三 上午8:47写道：

> Stavros,
> They have the similar logic concept, but the implementation details are
> quite different. It is hard to migrate the interface with different
> implementations. The built-in algorithms are useful legacy that we will
> consider migrate to the new API (but still with different implementations).
> BTW, the new API has already been merged via FLINK-12473.
>
> Thanks,
> Shaoxuan
>
>
>
> On Mon, Jun 3, 2019 at 6:08 PM Stavros Kontopoulos <
> [email protected]>
> wrote:
>
> > Hi,
> >
> > Some portion of the code could be migrated to the new Table API no?
> > I am saying that because the new API design is based on scikit-learn and
> > the old one was also inspired by it.
> >
> > Best,
> > Stavros
> > On Wed, May 22, 2019 at 1:24 PM Shaoxuan Wang <[email protected]>
> wrote:
> >
> > > Another consensus (from the offline discussion) is that we will
> > > delete/deprecate flink-libraries/flink-ml. I have started a survey and
> > > discussion [1] in dev/user-ml to collect the feedback. Depending on the
> > > replies, we will decide if we shall delete it in Flink1.9 or
> > > deprecate&delete in the next release after 1.9.
> > >
> > > [1]
> > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-ml-and-DISCUSS-Delete-flink-ml-td29057.html
> > >
> > > Regards,
> > > Shaoxuan
> > >
> > >
> > > On Tue, May 21, 2019 at 9:22 PM Gen Luo <[email protected]> wrote:
> > >
> > > > Yes, this is our conclusion. I'd like to add only one point that
> > > > registering user defined aggregator is also needed which is currently
> > > > provided by 'bridge' and finally will be merged into Table API. It's
> > same
> > > > with collect().
> > > >
> > > > I will add a TableEnvironment argument in Estimator.fit() and
> > > > Transformer.transform() to get rid of the dependency on
> > > > flink-table-planner. This will be committed soon.
> > > >
> > > > Aljoscha Krettek <[email protected]> 于2019年5月21日周二 下午7:31写道：
> > > >
> > > > > We discussed this in private and came to the conclusion that we
> > should
> > > > > (for now) have the dependency on flink-table-api-xxx-bridge because
> > we
> > > > need
> > > > > access to the collect() method, which is not yet available in the
> > Table
> > > > > API. Once that is available the code can be refactored but for now
> we
> > > > want
> > > > > to unblock work on this new module.
> > > > >
> > > > > We also agreed that we don’t need a direct dependency on
> > > > > flink-table-planner.
> > > > >
> > > > > I hope I summarised our discussion correctly.
> > > > >
> > > > > > On 17. May 2019, at 12:20, Gen Luo <[email protected]> wrote:
> > > > > >
> > > > > > Thanks for your reply.
> > > > > >
> > > > > > For the first question, it's not strictly necessary. But I perfer
> > not
> > > > to
> > > > > > have a TableEnvironment argument in Estimator.fit() or
> > > > > > Transformer.transform(), which is not part of machine learning
> > > concept,
> > > > > and
> > > > > > may make our API not as clean and pretty as other systems do. I
> > would
> > > > > like
> > > > > > another way other than introducing flink-table-planner to do
> this.
> > If
> > > > > it's
> > > > > > impossible or severely opposed, I may make the concession to add
> > the
> > > > > > argument.
> > > > > >
> > > > > > Other than that, "flink-table-api-xxx-bridge"s are still needed.
> A
> > > vary
> > > > > > common case is that an algorithm needs to guarantee that it's
> > running
> > > > > under
> > > > > > a BatchTableEnvironment, which makes it possible to collect
> result
> > > each
> > > > > > iteration. A typical algorithm like this is ALS. By flink1.8,
> this
> > > can
> > > > be
> > > > > > only achieved by converting Table to DataSet than call
> > > > DataSet.collect(),
> > > > > > which is available in flink-table-api-xxx-bridge. Besides,
> > > registering
> > > > > > UDAGG is also depending on it.
> > > > > >
> > > > > > In conclusion, '"planner" can be removed from dependencies but
> > > > > introducing
> > > > > > "bridge"s are inevitable. Whether and how to acquire
> > TableEnvironment
> > > > > from
> > > > > > a Table can be discussed.
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-39: Flink ML pipeline and ML libs

Reply via email to