Re: [DISCUSS] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

Hao Li Wed, 07 May 2025 09:01:21 -0700

Hi Ron,

I found these names in other systems:


`task_type` in big query ML [1]
`model_type` in databricks [2]

`task` is more of an abbreviated version from `task_type`.

[1]
https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate
[2]
https://www.databricks.com/blog/2022/04/19/model-evaluation-in-mlflow.html

Thanks,
Hao

On Tue, May 6, 2025 at 10:55 PM Ron Liu <ron9....@gmail.com> wrote:

> > It's mainly used for model evaluation purposes for `ML_EVALUATE`.
> Different
> loss functions will be used and different metrics will be output for
> `ML_EVALUATE` based on the task option of the model. Task option is not
> necessary if the model
> is not used in `ML_EVALUATE`. `ML_EVALUATE` also has an overloading method
> which can override the task type during evaluation.
>
> From your explanation, I personally feel that it might be more appropriate
> to replace task with a word more suited to the scenario, but of course I
> don't have a good suggestion at the moment, just a suggestion.
>
> Best,
> Ron
>
> Hao Li <h...@confluent.io.invalid> 于2025年5月7日周三 11:24写道：
>
> > Hi Yunfeng, Ron,
> >
> > Thanks for the feedback.
> >
> > >  it might be better to change the configuration api_key to apikey
> > Make sense. I updated the FLIP.
> >
> > > Why is it necessary to define the task option in the WITH clause of the
> > Model DDL, and what is its purpose?
> > It's mainly used for model evaluation purposes for `ML_EVALUATE`.
> Different
> > loss functions will be used and different metrics will be output for
> > `ML_EVALUATE` based on the task option of the model. Task option is not
> > necessary if the model
> > is not used in `ML_EVALUATE`. `ML_EVALUATE` also has an overloading
> method
> > which can override the task type during evaluation.
> >
> > Apart from evaluation, in the future, if model training is supported in
> > Flink, it can also serve the purpose of how the model can be trained.
> >
> > > About the CatalogModel interface, why does it need `getInputSchema` and
> > `getOutputSchema` methods? What is the role of Schema?
> > Schema is mainly to specific the input and output data type of the model
> > when it's used in prediction. During prediction, `ML_PREDICT` takes
> columns
> > from the input table matching the models input schema types and output
> > columns based on the model's output schema type.
> >
> > > Regarding the ModelProvider interface, what is the role of the copy
> > method?
> > I think it can be useful in the future if we need to copy it during the
> > planning stage and apply mutations to the provider. But it may not be
> used
> > for now. I'm also ok to remove it.
> >
> >
> > Hope this answers your question.
> >
> > Thanks,
> > Hao
> >
> >
> > On Tue, May 6, 2025 at 7:49 PM Ron Liu <ron9....@gmail.com> wrote:
> >
> > > Hi, Hao
> > >
> > > Thanks for starting this proposal, it's a great feature, +1.
> > >
> > > Since I was missing some context, I went to FLIP-437. Combining these
> two
> > > FLIPs, I have the following three questions:
> > > 1. Why is it necessary to define the task option in the WITH clause of
> > the
> > > Model DDL, and what is its purpose? I understand that one model can
> > support
> > > various types of tasks such as regression, classification, clustering,
> > > etc... But the example you have given gives me the impression that
> model
> > > can only perform a specific type of task, which confuses me. I think
> the
> > > task option is not needed
> > >
> > > 2. About the CatalogModel interface, why does it need `getInputSchema`
> > and
> > > `getOutputSchema` method, What is the role of Schema?
> > >
> > > 3. Regarding the ModelProvider interface, what is the role of the copy
> > > method? Since I don't know much about the implementation details, I'm
> > > curious about what cases need to be copied.
> > >
> > >
> > > Best,
> > > Ron
> > >
> > > Yunfeng Zhou <flink.zhouyunf...@gmail.com> 于2025年5月7日周三 09:33写道：
> > >
> > > > Hi Hao,
> > > >
> > > > Thanks for the FLIP! It provides a clearer guideline for developers
> to
> > > > implement model functions.
> > > >
> > > > One minor comment: it might be better to change the configuration
> > api_key
> > > > to apikey, which corresponds to GlobalConfiguration.SENSITIVE_KEYS.
> > > > Otherwise users’ secrets might be exposed in logs and cause security
> > > risks.
> > > >
> > > > Best,
> > > > Yunfeng
> > > >
> > > >
> > > > > 2025年4月29日 07:22，Hao Li <h...@confluent.io.INVALID> 写道：
> > > > >
> > > > > Hi All,
> > > > >
> > > > > I would like to start a discussion about FLIP-525 [1]: Model
> > > ML_PREDICT,
> > > > > ML_EVALUATE Implementation Design. This FLIP is co-authored with
> > > Shengkai
> > > > > Fang.
> > > > >
> > > > > This FLIP is a follow up of FLIP-437 [2] to propose the
> > implementation
> > > > > design for ML_PREDICT and ML_EVALUATE function which were
> introduced
> > in
> > > > > FLIP-437.
> > > > >
> > > > > For more details, see FLIP-525 [1]. Looking forward to your
> feedback.
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
> > > > > [2]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Hao
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

Reply via email to