Re: [DISCUSS] FLIP-289: Support online inference (Flink ML)

Dian Fu Thu, 09 Feb 2023 21:53:43 -0800

Hi Dong,

Thanks for the reply. Got your points and it makes sense to me now~


Regards,
Dian

On Thu, Feb 9, 2023 at 3:45 PM Dong Lin <lindon...@gmail.com> wrote:

> Hi Dian,
>
> Thanks for the review! Please see my reply inline.
>
> Regards,
> Dong
>
> On Wed, Feb 8, 2023 at 11:58 AM Dian Fu <dian0511...@gmail.com> wrote:
>
> > Hi Dong,
> >
> >
> > Thanks for driving this effort! This FLIP LGTM overall. I have just a few
> > minor comments regarding the proposed API:
> >
> > 1) For the method `DataFrame.collect()`, why is it named `collect`
> instead
> > of something else, e.g. `get`? Does it mean that the result will be
> > computed during this method call?
> >
>
> I have chosen the names to be closer to the names of similar classes or
> concepts in the existing popular open-source projects.
>
> The reasons for using `collect` include:
> - mleap is a popular machine learning serving framework and it uses
> LeapFrame#collect()
> <
> https://github.com/combust/mleap/blob/master/mleap-runtime/src/main/scala/ml/combust/mleap/runtime/frame/LeapFrame.scala#L13
> >
> to represent this concept.
> - Collect rows from table seems close to the concept of
> TableResult#collect() in Flink.
> - The method returns a *collection* of rows from this table, which makes
> `collect()` is bit more relevant than `get()`.
>
> Currently, I expect DataFrame.collect() to just return the values already
> computed before this method is called.
>
>
> > 2) For the method `DataFrame.addColumn`, when will this method be used?
> >
>
> For example, we might want to implement KMeansModelServable#transform
> that appends the prediction result to the input dataFrame. This is similar
> to the existing way that KMeansModel#transform
> <
> https://github.com/apache/flink-ml/blob/master/flink-ml-lib/src/main/java/org/apache/flink/ml/clustering/kmeans/KMeansModel.java#L84
> >
> appends
> the result to the input Table. Note that we probably don't want to copy the
> entire input DataFrame to the outpuDataFrame.
>
> We will need to use DataFrame.addColumn to append the prediction result in
> this case.
>
>
> > 3) In the example `runOnlineInferenceOnWebServer`, it uses
> > `output_df.getDataType("output")` to get the result type. It's not quite
> > intuitive. Does it make sense to add a method `getDataType()` in
> DataFrame?
>
>
> I guess you are asking whether we should just have a method getDataType()
> that returns types of all columns of the dataframe.
>
> Here is a scenario where users and algorithms will read only selected
> columns from DataFrame:
> - Users provided a DataFrame with columnA and columnB
> - The 1st algorithm will read columnA and columnB, then computes/appends
> columnC to the DataFrame.
> - The 2nd algorithm will read columnB and columnC, then computes/appends
> columnD to the DataFrame.
> - Users need to read columnD in the final DataFrame.
>
>
> > Regards,
> > Dian
> >
> >
> >
> > On Tue, Feb 7, 2023 at 12:37 PM Dong Lin <lindon...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > If there is no question related to this FLIP, we will start the voting
> > > thread on 2/10.
> > >
> > > Regards,
> > > Dong
> > >
> > > On Wed, Feb 1, 2023 at 8:38 PM Dong Lin <lindon...@gmail.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > Fan, Jiang, Zhipeng, and I have created FLIP-289: Support online
> > > inference
> > > > (Flink ML).
> > > >
> > > > The goal of this FLIP is to enable users to use the model trained by
> > > Flink
> > > > ML to do online inference. More details can be found in the FLIP doc
> at
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240881268
> > > > .
> > > >
> > > > We are looking forward to your comments.
> > > >
> > > > Regards,
> > > > Dong
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-289: Support online inference (Flink ML)

Reply via email to