Hi Ron, I found these names in other systems:
`task_type` in big query ML [1] `model_type` in databricks [2] `task` is more of an abbreviated version from `task_type`. [1] https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate [2] https://www.databricks.com/blog/2022/04/19/model-evaluation-in-mlflow.html Thanks, Hao On Tue, May 6, 2025 at 10:55 PM Ron Liu <ron9....@gmail.com> wrote: > > It's mainly used for model evaluation purposes for `ML_EVALUATE`. > Different > loss functions will be used and different metrics will be output for > `ML_EVALUATE` based on the task option of the model. Task option is not > necessary if the model > is not used in `ML_EVALUATE`. `ML_EVALUATE` also has an overloading method > which can override the task type during evaluation. > > From your explanation, I personally feel that it might be more appropriate > to replace task with a word more suited to the scenario, but of course I > don't have a good suggestion at the moment, just a suggestion. > > Best, > Ron > > Hao Li <h...@confluent.io.invalid> 于2025年5月7日周三 11:24写道: > > > Hi Yunfeng, Ron, > > > > Thanks for the feedback. > > > > > it might be better to change the configuration api_key to apikey > > Make sense. I updated the FLIP. > > > > > Why is it necessary to define the task option in the WITH clause of the > > Model DDL, and what is its purpose? > > It's mainly used for model evaluation purposes for `ML_EVALUATE`. > Different > > loss functions will be used and different metrics will be output for > > `ML_EVALUATE` based on the task option of the model. Task option is not > > necessary if the model > > is not used in `ML_EVALUATE`. `ML_EVALUATE` also has an overloading > method > > which can override the task type during evaluation. > > > > Apart from evaluation, in the future, if model training is supported in > > Flink, it can also serve the purpose of how the model can be trained. > > > > > About the CatalogModel interface, why does it need `getInputSchema` and > > `getOutputSchema` methods? What is the role of Schema? > > Schema is mainly to specific the input and output data type of the model > > when it's used in prediction. During prediction, `ML_PREDICT` takes > columns > > from the input table matching the models input schema types and output > > columns based on the model's output schema type. > > > > > Regarding the ModelProvider interface, what is the role of the copy > > method? > > I think it can be useful in the future if we need to copy it during the > > planning stage and apply mutations to the provider. But it may not be > used > > for now. I'm also ok to remove it. > > > > > > Hope this answers your question. > > > > Thanks, > > Hao > > > > > > On Tue, May 6, 2025 at 7:49 PM Ron Liu <ron9....@gmail.com> wrote: > > > > > Hi, Hao > > > > > > Thanks for starting this proposal, it's a great feature, +1. > > > > > > Since I was missing some context, I went to FLIP-437. Combining these > two > > > FLIPs, I have the following three questions: > > > 1. Why is it necessary to define the task option in the WITH clause of > > the > > > Model DDL, and what is its purpose? I understand that one model can > > support > > > various types of tasks such as regression, classification, clustering, > > > etc... But the example you have given gives me the impression that > model > > > can only perform a specific type of task, which confuses me. I think > the > > > task option is not needed > > > > > > 2. About the CatalogModel interface, why does it need `getInputSchema` > > and > > > `getOutputSchema` method, What is the role of Schema? > > > > > > 3. Regarding the ModelProvider interface, what is the role of the copy > > > method? Since I don't know much about the implementation details, I'm > > > curious about what cases need to be copied. > > > > > > > > > Best, > > > Ron > > > > > > Yunfeng Zhou <flink.zhouyunf...@gmail.com> 于2025年5月7日周三 09:33写道: > > > > > > > Hi Hao, > > > > > > > > Thanks for the FLIP! It provides a clearer guideline for developers > to > > > > implement model functions. > > > > > > > > One minor comment: it might be better to change the configuration > > api_key > > > > to apikey, which corresponds to GlobalConfiguration.SENSITIVE_KEYS. > > > > Otherwise users’ secrets might be exposed in logs and cause security > > > risks. > > > > > > > > Best, > > > > Yunfeng > > > > > > > > > > > > > 2025年4月29日 07:22,Hao Li <h...@confluent.io.INVALID> 写道: > > > > > > > > > > Hi All, > > > > > > > > > > I would like to start a discussion about FLIP-525 [1]: Model > > > ML_PREDICT, > > > > > ML_EVALUATE Implementation Design. This FLIP is co-authored with > > > Shengkai > > > > > Fang. > > > > > > > > > > This FLIP is a follow up of FLIP-437 [2] to propose the > > implementation > > > > > design for ML_PREDICT and ML_EVALUATE function which were > introduced > > in > > > > > FLIP-437. > > > > > > > > > > For more details, see FLIP-525 [1]. Looking forward to your > feedback. > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design > > > > > [2] > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL > > > > > > > > > > > > > > > Thanks, > > > > > Hao > > > > > > > > > > > > > >