Re: [DISCUSS] FLIP-289: Support online inference (Flink ML)

Dong Lin Wed, 08 Feb 2023 23:44:55 -0800

Hi Dian,

Thanks for the review! Please see my reply inline.


Regards,
Dong

On Wed, Feb 8, 2023 at 11:58 AM Dian Fu <dian0511...@gmail.com> wrote:

> Hi Dong,
>
>
> Thanks for driving this effort! This FLIP LGTM overall. I have just a few
> minor comments regarding the proposed API:
>
> 1) For the method `DataFrame.collect()`, why is it named `collect` instead
> of something else, e.g. `get`? Does it mean that the result will be
> computed during this method call?
>

I have chosen the names to be closer to the names of similar classes or
concepts in the existing popular open-source projects.

The reasons for using `collect` include:
- mleap is a popular machine learning serving framework and it uses
LeapFrame#collect()
<https://github.com/combust/mleap/blob/master/mleap-runtime/src/main/scala/ml/combust/mleap/runtime/frame/LeapFrame.scala#L13>
to represent this concept.
- Collect rows from table seems close to the concept of
TableResult#collect() in Flink.
- The method returns a *collection* of rows from this table, which makes
`collect()` is bit more relevant than `get()`.

Currently, I expect DataFrame.collect() to just return the values already
computed before this method is called.


> 2) For the method `DataFrame.addColumn`, when will this method be used?
>

For example, we might want to implement KMeansModelServable#transform
that appends the prediction result to the input dataFrame. This is similar
to the existing way that KMeansModel#transform
<https://github.com/apache/flink-ml/blob/master/flink-ml-lib/src/main/java/org/apache/flink/ml/clustering/kmeans/KMeansModel.java#L84>
appends
the result to the input Table. Note that we probably don't want to copy the
entire input DataFrame to the outpuDataFrame.

We will need to use DataFrame.addColumn to append the prediction result in
this case.


> 3) In the example `runOnlineInferenceOnWebServer`, it uses
> `output_df.getDataType("output")` to get the result type. It's not quite
> intuitive. Does it make sense to add a method `getDataType()` in DataFrame?


I guess you are asking whether we should just have a method getDataType()
that returns types of all columns of the dataframe.

Here is a scenario where users and algorithms will read only selected
columns from DataFrame:
- Users provided a DataFrame with columnA and columnB
- The 1st algorithm will read columnA and columnB, then computes/appends
columnC to the DataFrame.
- The 2nd algorithm will read columnB and columnC, then computes/appends
columnD to the DataFrame.
- Users need to read columnD in the final DataFrame.


> Regards,
> Dian
>
>
>
> On Tue, Feb 7, 2023 at 12:37 PM Dong Lin <lindon...@gmail.com> wrote:
>
> > Hi all,
> >
> > If there is no question related to this FLIP, we will start the voting
> > thread on 2/10.
> >
> > Regards,
> > Dong
> >
> > On Wed, Feb 1, 2023 at 8:38 PM Dong Lin <lindon...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > Fan, Jiang, Zhipeng, and I have created FLIP-289: Support online
> > inference
> > > (Flink ML).
> > >
> > > The goal of this FLIP is to enable users to use the model trained by
> > Flink
> > > ML to do online inference. More details can be found in the FLIP doc at
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240881268
> > > .
> > >
> > > We are looking forward to your comments.
> > >
> > > Regards,
> > > Dong
> > >
> >
>

Re: [DISCUSS] FLIP-289: Support online inference (Flink ML)

Reply via email to