Hi Dong, Thanks for the reply. Got your points and it makes sense to me now~
Regards, Dian On Thu, Feb 9, 2023 at 3:45 PM Dong Lin <lindon...@gmail.com> wrote: > Hi Dian, > > Thanks for the review! Please see my reply inline. > > Regards, > Dong > > On Wed, Feb 8, 2023 at 11:58 AM Dian Fu <dian0511...@gmail.com> wrote: > > > Hi Dong, > > > > > > Thanks for driving this effort! This FLIP LGTM overall. I have just a few > > minor comments regarding the proposed API: > > > > 1) For the method `DataFrame.collect()`, why is it named `collect` > instead > > of something else, e.g. `get`? Does it mean that the result will be > > computed during this method call? > > > > I have chosen the names to be closer to the names of similar classes or > concepts in the existing popular open-source projects. > > The reasons for using `collect` include: > - mleap is a popular machine learning serving framework and it uses > LeapFrame#collect() > < > https://github.com/combust/mleap/blob/master/mleap-runtime/src/main/scala/ml/combust/mleap/runtime/frame/LeapFrame.scala#L13 > > > to represent this concept. > - Collect rows from table seems close to the concept of > TableResult#collect() in Flink. > - The method returns a *collection* of rows from this table, which makes > `collect()` is bit more relevant than `get()`. > > Currently, I expect DataFrame.collect() to just return the values already > computed before this method is called. > > > > 2) For the method `DataFrame.addColumn`, when will this method be used? > > > > For example, we might want to implement KMeansModelServable#transform > that appends the prediction result to the input dataFrame. This is similar > to the existing way that KMeansModel#transform > < > https://github.com/apache/flink-ml/blob/master/flink-ml-lib/src/main/java/org/apache/flink/ml/clustering/kmeans/KMeansModel.java#L84 > > > appends > the result to the input Table. Note that we probably don't want to copy the > entire input DataFrame to the outpuDataFrame. > > We will need to use DataFrame.addColumn to append the prediction result in > this case. > > > > 3) In the example `runOnlineInferenceOnWebServer`, it uses > > `output_df.getDataType("output")` to get the result type. It's not quite > > intuitive. Does it make sense to add a method `getDataType()` in > DataFrame? > > > I guess you are asking whether we should just have a method getDataType() > that returns types of all columns of the dataframe. > > Here is a scenario where users and algorithms will read only selected > columns from DataFrame: > - Users provided a DataFrame with columnA and columnB > - The 1st algorithm will read columnA and columnB, then computes/appends > columnC to the DataFrame. > - The 2nd algorithm will read columnB and columnC, then computes/appends > columnD to the DataFrame. > - Users need to read columnD in the final DataFrame. > > > > Regards, > > Dian > > > > > > > > On Tue, Feb 7, 2023 at 12:37 PM Dong Lin <lindon...@gmail.com> wrote: > > > > > Hi all, > > > > > > If there is no question related to this FLIP, we will start the voting > > > thread on 2/10. > > > > > > Regards, > > > Dong > > > > > > On Wed, Feb 1, 2023 at 8:38 PM Dong Lin <lindon...@gmail.com> wrote: > > > > > > > Hi all, > > > > > > > > Fan, Jiang, Zhipeng, and I have created FLIP-289: Support online > > > inference > > > > (Flink ML). > > > > > > > > The goal of this FLIP is to enable users to use the model trained by > > > Flink > > > > ML to do online inference. More details can be found in the FLIP doc > at > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240881268 > > > > . > > > > > > > > We are looking forward to your comments. > > > > > > > > Regards, > > > > Dong > > > > > > > > > >