[DISCUSS] FLIP-437: Support ML Models in Flink SQL
Hi, Dev Mingge, Chris and I would like to start a discussion about FLIP-437: Support ML Models in Flink SQL. This FLIP is proposing to support machine learning models in Flink SQL syntax so that users can CRUD models with Flink SQL and use models on Flink to do prediction with Flink data. The FLIP also proposes new model entities and changes to catalog interface to support model CRUD operations in catalog. For more details, see FLIP-437 [1]. Looking forward to your feedback. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL Thanks, Minge, Chris & Hao
Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL
Hi Jark, Thanks for your questions. These are good questions! 1. The polymorphism table function I was referring to takes a table as input and outputs a table. So the syntax would be like ``` SELECT * FROM ML_PREDICT('model', (SELECT * FROM my_table)) ``` As far as I know, this is not supported yet on Flink. So before it's supported, one option for the predict function is using table function which can output multiple columns ``` SELECT * FROM my_table, LATERAL VIEW (ML_PREDICT('model', col1, col2)) ``` 2. Good question. Type inference is hard for the `ML_PREDICT` function because it takes a model name string as input. I can think of three ways of doing type inference for it. 1). Treat `ML_PREDICT` function as something special and during sql parsing or planning time, if it's encountered, we need to look up the model from the first argument which is a model name from catalog. Then we can infer the input/output for the function. 2). We can define a `model` keyword and use that in the predict function to indicate the argument refers to a model. So it's like `ML_PREDICT(model 'my_model', col1, col2))` 3). We can create a special type of table function maybe called `ModelFunction` which can resolve the model type inference by special handling it during parsing or planning time. 1) is hacky, 2) isn't supported in Flink for function, 3) might be a good option. 3. I sketched the `ML_PREDICT` function for inference. But there are limitations of the function mentioned in 1 and 2. So maybe we don't need to introduce them as built-in functions until polymorphism table function and we can properly deal with type inference. After that, defining a user-defined model function should also be straightforward. 4. For model types, do you mean 'remote', 'import', 'native' models or other things? 5. We could support popular providers such as 'azureml', 'vertexai', 'googleai' as long as we support the `ML_PREDICT` function. Users should be able to implement 3rd-party providers if they can implement a function handling the input/output for the provider. I think for the model functions, there are still dependencies or hacks we need to sort out as a built-in function. Maybe we can separate that as a follow up if we want to have it built-in and focus on the model syntax for this FLIP? Thanks, Hao On Tue, Mar 12, 2024 at 10:33 PM Jark Wu wrote: > Hi Minge, Chris, Hao, > > Thanks for proposing this interesting idea. I think this is a nice step > towards > the AI world for Apache Flink. I don't know much about AI/ML, so I may have > some stupid questions. > > 1. Could you tell more about why polymorphism table function (PTF) doesn't > work and do we have plan to use PTF as model functions? > > 2. What kind of object does the model map to in SQL? A relation or a data > type? > It looks like a data type because we use it as a parameter of the table > function. > If it is a data type, how does it cooperate with type inference[1]? > > 3. What built-in model functions will we support? How to define a > user-defined model function? > > 4. What built-in model types will we support? How to define a user-defined > model type? > > 5. Regarding the remote model, what providers will we support? Can users > implement > 3rd-party providers except OpenAI? > > Best, > Jark > > [1]: > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/udfs/#type-inference > > > > > On Wed, 13 Mar 2024 at 05:55, Hao Li wrote: > > > Hi, Dev > > > > > > Mingge, Chris and I would like to start a discussion about FLIP-437: > > Support ML Models in Flink SQL. > > > > This FLIP is proposing to support machine learning models in Flink SQL > > syntax so that users can CRUD models with Flink SQL and use models on > Flink > > to do prediction with Flink data. The FLIP also proposes new model > entities > > and changes to catalog interface to support model CRUD operations in > > catalog. > > > > For more details, see FLIP-437 [1]. Looking forward to your feedback. > > > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL > > > > Thanks, > > Minge, Chris & Hao > > >
Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL
Hi Jark, Thanks for the pointers. It's very helpful. 1. Looks like `tumble`, `hopping` are keywords in calcite parser. And the syntax `cumulate(Table my_table, ...)` needs to get table information from catalog somewhere for type validation etc. Can you send me some pointers where the function gets the table information? 2. The ideal syntax for model function I think would be `ML_PREDICT(MODEL , {table | (query_stmt) })`. I think with special handling of the `ML_PREDICT` function in parser/planner, maybe we can do this like window functions. But to support `MODEL` keyword, we need calcite parser change I guess. Also is it possible to support in window functions in addiction to table? For the redshift syntax, I'm not sure the purpose of defining the function name with the model. Is it to define the function input/output schema? We have the schema in our create model syntax and the `ML_PREDICT` can handle it by getting model definition. I think our syntax is more concise to have a generic prediction function. I also did some research and it's the syntax used by Databricks `ai_query` [1], Snowflake `predict` [2], Azureml `predict` [3]. [1]: https://docs.databricks.com/en/sql/language-manual/functions/ai_query.html [2]: https://github.com/Snowflake-Labs/sfguide-intro-to-machine-learning-with-snowpark-ml-for-python/blob/main/3_snowpark_ml_model_training_inference.ipynb?_fsi=sksXUwQ0 [3]: https://learn.microsoft.com/en-us/sql/machine-learning/tutorials/quickstart-python-train-score-model?view=azuresqldb-mi-current Thanks, Hao On Wed, Mar 13, 2024 at 8:57 PM Jark Wu wrote: > Hi Mingge, Hao, > > Thanks for your replies. > > > PTF is actually the ideal approach for model functions, and we do have > the plans to use PTF for > all model functions (including prediction, evaluation etc..) once the PTF > is supported in FlinkSQL > confluent extension. > > It sounds that PTF is the ideal way and table function is a temporary > solution which will be dropped in the future. > I'm not sure whether we can implement it using PTF in Flink SQL. But we > have implemented window > functions using PTF[1]. And introduced a new window function (called > CUMULATE[2]) in Flink SQL based > on this. I think it might work to use PTF and implement model function > syntax like this: > > SELECT * FROM TABLE(ML_PREDICT( > TABLE my_table, > my_model, > col1, > col2 > )); > > Besides, did you consider following the way of AWS Redshift which defines > model function with the model itself together? > IIUC, a model is a black-box which defines input parameters and output > parameters which can be modeled into functions. > > > Best, > Jark > > [1]: > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-tvf/#session > [2]: > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-CumulatingWindows > [3]: > > https://github.com/aws-samples/amazon-redshift-ml-getting-started/blob/main/use-cases/bring-your-own-model-remote-inference/README.md#create-model > > > > > On Wed, 13 Mar 2024 at 15:00, Hao Li wrote: > > > Hi Jark, > > > > Thanks for your questions. These are good questions! > > > > 1. The polymorphism table function I was referring to takes a table as > > input and outputs a table. So the syntax would be like > > ``` > > SELECT * FROM ML_PREDICT('model', (SELECT * FROM my_table)) > > ``` > > As far as I know, this is not supported yet on Flink. So before it's > > supported, one option for the predict function is using table function > > which can output multiple columns > > ``` > > SELECT * FROM my_table, LATERAL VIEW (ML_PREDICT('model', col1, col2)) > > ``` > > > > 2. Good question. Type inference is hard for the `ML_PREDICT` function > > because it takes a model name string as input. I can think of three ways > of > > doing type inference for it. > >1). Treat `ML_PREDICT` function as something special and during sql > > parsing or planning time, if it's encountered, we need to look up the > model > > from the first argument which is a model name from catalog. Then we can > > infer the input/output for the function. > >2). We can define a `model` keyword and use that in the predict > function > > to indicate the argument refers to a model. So it's like > `ML_PREDICT(model > > 'my_model', col1, col2))` > >3). We can create a special type of table function maybe called > > `ModelFunction` which can resolve the model type inference by special > > handling it during parsing or p
Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL
Hi Jark, Thanks for the pointer. Sorry for the confusion: I meant how the table name in window TVF gets translated to `SqlCallingBinding`. Probably we need to fetch the table definition from the catalog somewhere. Do we treat those window TVF specially in parser/planner so that catalog is looked up when they are seen? For what model is, I'm wondering if it has to be datatype or relation. Can it be another kind of citizen parallel to datatype/relation/function/db? Redshift also supports `show models` operation, so it seems it's treated specially as well? The reasons I don't like Redshift's syntax are: 1. It's a bit verbose, you need to think of a model name as well as a function name and the function name also needs to be unique. 2. More importantly, prediction function isn't the only function that can operate on models. There could be a set of inference functions [1] and evaluation functions [2] which can operate on models. It's hard to specify all of them in model creation. [1]: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-predict [2]: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate Thanks, Hao On Thu, Mar 14, 2024 at 8:18 PM Jark Wu wrote: > Hi Hao, > > > Can you send me some pointers > where the function gets the table information? > > Here is the code of cumulate window type checking [1]. > > > Also is it possible to support in > window functions in addiction to table? > > Yes. It is not allowed in TVF. > > Thanks for the syntax links of other systems. The reason I prefer the > Redshift way is > that it avoids introducing Model as a relation or datatype (referenced as a > parameter in TVF). > Model is not a relation because it can be queried directly (e.g., SELECT * > FROM model). > I'm also confused about making Model as a datatype, because I don't know > what class the > model parameter of the eval method of TableFunction/ScalarFunction should > be. By defining > the function with the model, users can directly invoke the function without > reference to the model name. > > Best, > Jark > > [1]: > > https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/functions/sql/SqlCumulateTableFunction.java#L53 > > On Fri, 15 Mar 2024 at 02:48, Hao Li wrote: > > > Hi Jark, > > > > Thanks for the pointers. It's very helpful. > > > > 1. Looks like `tumble`, `hopping` are keywords in calcite parser. And the > > syntax `cumulate(Table my_table, ...)` needs to get table information > from > > catalog somewhere for type validation etc. Can you send me some pointers > > where the function gets the table information? > > 2. The ideal syntax for model function I think would be `ML_PREDICT(MODEL > > , {table | (query_stmt) })`. I think with > special > > handling of the `ML_PREDICT` function in parser/planner, maybe we can do > > this like window functions. But to support `MODEL` keyword, we need > calcite > > parser change I guess. Also is it possible to support in > > window functions in addiction to table? > > > > For the redshift syntax, I'm not sure the purpose of defining the > function > > name with the model. Is it to define the function input/output schema? We > > have the schema in our create model syntax and the `ML_PREDICT` can > handle > > it by getting model definition. I think our syntax is more concise to > have > > a generic prediction function. I also did some research and it's the > syntax > > used by Databricks `ai_query` [1], Snowflake `predict` [2], Azureml > > `predict` [3]. > > > > [1]: > > > https://docs.databricks.com/en/sql/language-manual/functions/ai_query.html > > [2]: > > > > > https://github.com/Snowflake-Labs/sfguide-intro-to-machine-learning-with-snowpark-ml-for-python/blob/main/3_snowpark_ml_model_training_inference.ipynb?_fsi=sksXUwQ0 > > [3]: > > > > > https://learn.microsoft.com/en-us/sql/machine-learning/tutorials/quickstart-python-train-score-model?view=azuresqldb-mi-current > > > > Thanks, > > Hao > > > > On Wed, Mar 13, 2024 at 8:57 PM Jark Wu wrote: > > > > > Hi Mingge, Hao, > > > > > > Thanks for your replies. > > > > > > > PTF is actually the ideal approach for model functions, and we do > have > > > the plans to use PTF for > > > all model functions (including prediction, evaluation etc..) once the > PTF > > > is supported in FlinkSQL > > > confluent extension. > > > > > >
Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL
eate a FLIP for just this syntax question and collaborate > >> with the Calcite community on this. Supporting the syntax of Oracle > >> shouldn't be too hard to convince at least as parser parameter. > >> > >> Regards, > >> Timo > >> > >> [1] > >> > >> > https://cwiki.apache.org/confluence/display/FLINK/%5BWIP%5D+FLIP-440%3A+User-defined+Polymorphic+Table+Functions > >> [2] > >> > >> > https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_TF.html#GUID-0F66E239-DE77-4C0E-AC76-D5B632AB8072 > >> [3] > https://oracle-base.com/articles/18c/polymorphic-table-functions-18c > >> > >> > >> > >> On 20.03.24 17:22, Mingge Deng wrote: > >> > Thanks Jark for all the insightful comments. > >> > > >> > We have updated the proposal per our offline discussions: > >> > 1. Model will be treated as a new relation in FlinkSQL. > >> > 2. Include the common ML predict and evaluate functions into the open > >> > source flink to complete the user journey. > >> > And we should be able to extend the calcite SqlTableFunction to > >> support > >> > these two ML functions. > >> > > >> > Best, > >> > Mingge > >> > > >> > On Mon, Mar 18, 2024 at 7:05 PM Jark Wu wrote: > >> > > >> >> Hi Hao, > >> >> > >> >>> I meant how the table name > >> >> in window TVF gets translated to `SqlCallingBinding`. Probably we > need > >> to > >> >> fetch the table definition from the catalog somewhere. Do we treat > >> those > >> >> window TVF specially in parser/planner so that catalog is looked up > >> when > >> >> they are seen? > >> >> > >> >> The table names are resolved and validated by Calcite SqlValidator. > We > >> >> don' need to fetch from catalog manually. > >> >> The specific checking logic of cumulate window happens in > >> >> SqlCumulateTableFunction.OperandMetadataImpl#checkOperandTypes. > >> >> The return type of SqlCumulateTableFunction is defined in > >> >> #getRowTypeInference() method. > >> >> Both are public interfaces provided by Calcite and it seems it's not > >> >> specially handled in parser/planner. > >> >> > >> >> I didn't try that, but my gut feeling is that the framework is ready > to > >> >> extend a customized TVF. > >> >> > >> >>> For what model is, I'm wondering if it has to be datatype or > relation. > >> >> Can > >> >> it be another kind of citizen parallel to > >> datatype/relation/function/db? > >> >> Redshift also supports `show models` operation, so it seems it's > >> treated > >> >> specially as well? > >> >> > >> >> If it is an entity only used in catalog scope (e.g., show xxx, create > >> xxx, > >> >> drop xxx), it is fine to introduce it. > >> >> We have introduced such one before, called Module: "load module", > "show > >> >> modules" [1]. > >> >> But if we want to use Model in TVF parameters, it means it has to be > a > >> >> relation or datatype, because > >> >> that is what it only accepts now. > >> >> > >> >> Thanks for sharing the reason of preferring TVF instead of Redshift > >> way. It > >> >> sounds reasonable to me. > >> >> > >> >> Best, > >> >> Jark > >> >> > >> >> [1]: > >> >> > >> >> > >> > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/modules/ > >> >> > >> >> On Fri, 15 Mar 2024 at 13:41, Hao Li > wrote: > >> >> > >> >>> Hi Jark, > >> >>> > >> >>> Thanks for the pointer. Sorry for the confusion: I meant how the > table > >> >> name > >> >>> in window TVF gets translated to `SqlCallingBinding`. Probably we > >> need to > >> >>> fetch the table definition from the catalog somewhere. Do we treat > >> those > >> >>> window TVF specially in parser/planner so that catalog is looked up > >> when > >> >>> they are seen? > >> &
Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL
Hi Ahmed, Looks like the feature freeze time for 1.20 release is June 15th. We can definitely get the model DDL into 1.20. For predict and evaluate functions, if we can't get into the 1.20 release, we can get them into the 1.21 release for sure. Thanks, Hao On Mon, Mar 25, 2024 at 1:25 AM Timo Walther wrote: > Hi Jark and Hao, > > thanks for the information, Jark! Great that the Calcite community > already fixed the problem for us. +1 to adopt the simplified syntax > asap. Maybe even before we upgrade Calcite (i.e. copy over classes), if > upgrading Calcite is too much work right now? > > > Is `DESCRIPTOR` a must in the syntax? > > Yes, we should still stick to the standard as much as possible and all > vendors use DESCRIPTOR/COLUMNS for distinuishing columns vs. literal > arguments. So the final syntax of this discussion would be: > > > SELECT f1, f2, label FROM >ML_PREDICT(TABLE `my_data`, `classifier_model`, DESCRIPTOR(f1, f2)) > > SELECT * FROM >ML_EVALUATE(TABLE `eval_data`, `classifier_model`, DESCRIPTOR(f1, f2)) > > Please double check if this is implementable with the current stack. I > fear the parser or validator might not like the "identifier" argument? > > Make sure that also these variations are supported: > > SELECT f1, f2, label FROM >ML_PREDICT( > TABLE `my_data`, > my_cat.my_db.classifier_model, > DESCRIPTOR(f1, f2)) > > SELECT f1, f2, label FROM >ML_PREDICT( > input => TABLE `my_data`, > model => my_cat.my_db.classifier_model, > args => DESCRIPTOR(f1, f2)) > > It might be safer and more future proof to wrap a MODEL() function > around it. This would be more in sync with the standard that actually > still requires to put a TABLE() around the input argument: > > ML_PREDICT(TABLE(`my_data`) PARTITIONED BY c1 ORDERED BY c1, ) > > So the safest option would be the long-term solution: > > SELECT f1, f2, label FROM >ML_PREDICT( > input => TABLE(my_data), > model => MODEL(my_cat.my_db.classifier_model), > args => DESCRIPTOR(f1, f2)) > > But I'm fine with this if others have a strong opinion: > > SELECT f1, f2, label FROM >ML_PREDICT( > input => TABLE `my_data`, > model => my_cat.my_db.classifier_model, > args => DESCRIPTOR(f1, f2)) > > Some feedback for the remainder of the FLIP: > > 1) Simplify catalog objects > > I would suggest to drop: > CatalogModel.getModelKind() > CatalogModel.getModelTask() > > A catalog object should fully resemble the DDL. And since the DDL puts > those properties in the WITH clause, the catalog object should the same > (i.e. put them into the `getModelOptions()`). Btw renaming this method > to just `getOptions()` for consistency should be good as well. > Internally, we can still provide enums for these frequently used > classes. Similar to what we do in `FactoryUtil` for other frequently > used options. > > Remove `getDescription()` and `getDetailedDescription()`. They were a > mistake for CatalogTable and should actually be deprecated. They got > replaced by `getComment()` which is sufficient. > > 2) CREATE TEMPORARY MODEL is not supported. > > This is an unnecessary restriction. We should support temporary versions > of these catalog objects as well for consistency. Adding support for > this should be straightforward. > > 3) DESCRIBE | DESC } MODEL [catalog_name.][database_name.]model_name > > I would suggest we support `SHOW CREATE MODEL` instead. Similar to `SHOW > CREATE TABLE`, this should show all properties. If we support `DESCRIBE > MODEL` it should only list the input parameters similar to `DESCRIBE > TABLE` only shows the columns (not the WITH clause). > > Regards, > Timo > > > On 23.03.24 13:17, Ahmed Hamdy wrote: > > Hi everyone, > > +1 for this proposal, I believe it is very useful to the minimum, It > would > > be great even having "ML_PREDICT" and "ML_EVALUATE" as built-in PTFs in > > this FLIP as discussed. > > IIUC this will be included in the 1.20 roadmap? > > Best Regards > > Ahmed Hamdy > > > > > > On Fri, 22 Mar 2024 at 23:54, Hao Li wrote: > > > >> Hi Timo and Jark, > >> > >> I agree Oracle's syntax seems concise and more descriptive. For the > >> built-in `ML_PREDICT` and `ML_EVALUATE` functions I agree with Jark we > can > >> support them as built-in PTF using `SqlTableFunction` for this FLIP. We > can > >> have a different FLIP discussing user defined PTF and adopt that later > for > >> model functions later. To summarize, the current propos
Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL
Hi Timo, > Please double check if this is implementable with the current stack. I fear the parser or validator might not like the "identifier" argument? I checked this, currently the validator throws an exception trying to get the full qualifier name for `classifier_model`. But since `SqlValidatorImpl` is implemented in Flink, we should be able to fix this. The only caveator is if not full model path is provided, the qualifier is interpreted as a column. We should be able to special handle this by rewriting the `ml_predict` function to add the catalog and database name in `FlinkCalciteSqlValidator` though. > SELECT f1, f2, label FROM ML_PREDICT( TABLE `my_data`, my_cat.my_db.classifier_model, DESCRIPTOR(f1, f2)) SELECT f1, f2, label FROM ML_PREDICT( input => TABLE `my_data`, model => my_cat.my_db.classifier_model, args => DESCRIPTOR(f1, f2)) I verified these can be parsed. The problem is in validator for qualifier as mentioned above. > So the safest option would be the long-term solution: SELECT f1, f2, label FROM ML_PREDICT( input => TABLE(my_data), model => MODEL(my_cat.my_db.classifier_model), args => DESCRIPTOR(f1, f2)) `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't work since `TABLE` and `MODEL` are already key words in calcite used by `CREATE TABLE`, `CREATE MODEL`. Changing to `model_name(...)` works and will be treated as a function. So I think SELECT f1, f2, label FROM ML_PREDICT( input => TABLE `my_data`, model => my_cat.my_db.classifier_model, args => DESCRIPTOR(f1, f2)) should be fine for now. For the syntax part: 1). Sounds good. We can drop model task and model kind from the definition. They can be deduced from the options. 2). Sure. We can add temporary model 3). Make sense. We can use `show create model ` to display all information and `describe model ` to show input/output schema Thanks, Hao On Mon, Mar 25, 2024 at 3:21 PM Hao Li wrote: > Hi Ahmed, > > Looks like the feature freeze time for 1.20 release is June 15th. We can > definitely get the model DDL into 1.20. For predict and evaluate functions, > if we can't get into the 1.20 release, we can get them into the 1.21 > release for sure. > > Thanks, > Hao > > > > On Mon, Mar 25, 2024 at 1:25 AM Timo Walther wrote: > >> Hi Jark and Hao, >> >> thanks for the information, Jark! Great that the Calcite community >> already fixed the problem for us. +1 to adopt the simplified syntax >> asap. Maybe even before we upgrade Calcite (i.e. copy over classes), if >> upgrading Calcite is too much work right now? >> >> > Is `DESCRIPTOR` a must in the syntax? >> >> Yes, we should still stick to the standard as much as possible and all >> vendors use DESCRIPTOR/COLUMNS for distinuishing columns vs. literal >> arguments. So the final syntax of this discussion would be: >> >> >> SELECT f1, f2, label FROM >>ML_PREDICT(TABLE `my_data`, `classifier_model`, DESCRIPTOR(f1, f2)) >> >> SELECT * FROM >>ML_EVALUATE(TABLE `eval_data`, `classifier_model`, DESCRIPTOR(f1, f2)) >> >> Please double check if this is implementable with the current stack. I >> fear the parser or validator might not like the "identifier" argument? >> >> Make sure that also these variations are supported: >> >> SELECT f1, f2, label FROM >>ML_PREDICT( >> TABLE `my_data`, >> my_cat.my_db.classifier_model, >> DESCRIPTOR(f1, f2)) >> >> SELECT f1, f2, label FROM >>ML_PREDICT( >> input => TABLE `my_data`, >> model => my_cat.my_db.classifier_model, >> args => DESCRIPTOR(f1, f2)) >> >> It might be safer and more future proof to wrap a MODEL() function >> around it. This would be more in sync with the standard that actually >> still requires to put a TABLE() around the input argument: >> >> ML_PREDICT(TABLE(`my_data`) PARTITIONED BY c1 ORDERED BY c1, ) >> >> So the safest option would be the long-term solution: >> >> SELECT f1, f2, label FROM >>ML_PREDICT( >> input => TABLE(my_data), >> model => MODEL(my_cat.my_db.classifier_model), >> args => DESCRIPTOR(f1, f2)) >> >> But I'm fine with this if others have a strong opinion: >> >> SELECT f1, f2, label FROM >>ML_PREDICT( >> input => TABLE `my_data`, >> model => my_cat.my_db.classifier_model, >> args => DESCRIPTOR(f1, f2)) >> >> Some feedback for the remainder of the FLIP: >> >> 1) Simplify catalog objects >> >> I would suggest to dr
Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL
Hi Timo, Yeah. For `primary key` and `from table(...)` those are explicitly matched in parser: [1]. > SELECT f1, f2, label FROM ML_PREDICT( input => `my_data`, model => `my_cat`.`my_db`.`classifier_model`, args => DESCRIPTOR(f1, f2)); This named argument syntax looks good to me. It can be supported together with SELECT f1, f2, label FROM ML_PREDICT(`my_data`, `my_cat`.`my_db`.`classifier_model`,DESCRIPTOR(f1, f2)); Sure. Will let you know once updated the FLIP. [1] https://github.com/confluentinc/flink/blob/release-1.18-confluent/flink-table/flink-sql-parser/src/main/codegen/includes/parserImpls.ftl#L814 Thanks, Hao On Tue, Mar 26, 2024 at 4:15 AM Timo Walther wrote: > Hi Hao, > > > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't > > work since `TABLE` and `MODEL` are already key words > > This argument doesn't count. The parser supports introducing keywords > that are still non-reserved. For example, this enables using "key" for > both primary key and a column name: > > CREATE TABLE t (i INT PRIMARY KEY NOT ENFORCED) > WITH ('connector' = 'datagen'); > > SELECT i AS key FROM t; > > I'm sure we will introduce `TABLE(my_data)` eventually as this is what > the standard dictates. But for now, let's use the most compact syntax > possible which is also in sync with Oracle. > > TLDR: We allow identifiers as arguments for PTFs which are expanded with > catalog and database if necessary. Those identifier arguments translate > to catalog lookups for table and models. The ML_ functions will make > sure that the arguments are of correct type model or table. > > SELECT f1, f2, label FROM >ML_PREDICT( > input => `my_data`, > model => `my_cat`.`my_db`.`classifier_model`, > args => DESCRIPTOR(f1, f2)); > > So this will allow us to also use in the future: > > SELECT * FROM poly_func(table1); > > Same support as Oracle [1]. Very concise. > > Let me know when you updated the FLIP for a final review before voting. > > Do others have additional objections? > > Regards, > Timo > > [1] > > https://livesql.oracle.com/apex/livesql/file/content_HQK7TYEO0NHSJCDY3LN2ERDV6.html > > > > On 25.03.24 23:40, Hao Li wrote: > > Hi Timo, > > > >> Please double check if this is implementable with the current stack. I > > fear the parser or validator might not like the "identifier" argument? > > > > I checked this, currently the validator throws an exception trying to get > > the full qualifier name for `classifier_model`. But since > > `SqlValidatorImpl` is implemented in Flink, we should be able to fix > this. > > The only caveator is if not full model path is provided, > > the qualifier is interpreted as a column. We should be able to special > > handle this by rewriting the `ml_predict` function to add the catalog and > > database name in `FlinkCalciteSqlValidator` though. > > > >> SELECT f1, f2, label FROM > > ML_PREDICT( > > TABLE `my_data`, > > my_cat.my_db.classifier_model, > > DESCRIPTOR(f1, f2)) > > > > SELECT f1, f2, label FROM > > ML_PREDICT( > > input => TABLE `my_data`, > > model => my_cat.my_db.classifier_model, > > args => DESCRIPTOR(f1, f2)) > > > > I verified these can be parsed. The problem is in validator for qualifier > > as mentioned above. > > > >> So the safest option would be the long-term solution: > > > > SELECT f1, f2, label FROM > > ML_PREDICT( > > input => TABLE(my_data), > > model => MODEL(my_cat.my_db.classifier_model), > > args => DESCRIPTOR(f1, f2)) > > > > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't work > > since `TABLE` and `MODEL` are already key words in calcite used by > `CREATE > > TABLE`, `CREATE MODEL`. Changing to `model_name(...)` works and will be > > treated as a function. > > > > So I think > > > > SELECT f1, f2, label FROM > > ML_PREDICT( > > input => TABLE `my_data`, > > model => my_cat.my_db.classifier_model, > > args => DESCRIPTOR(f1, f2)) > > should be fine for now. > > > > For the syntax part: > > 1). Sounds good. We can drop model task and model kind from the > definition. > > They can be deduced from the options. > > > > 2). Sure. We can add temporary model > > > > 3). Make sense. We can use `show create model ` to display all > > information and `describe model ` to show input/output
Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL
Hi Jark, I think we can start with supporting popular model providers such as openai, azureml, sagemaker for remote models. Thanks, Hao On Tue, Mar 26, 2024 at 8:15 PM Jark Wu wrote: > Thanks for the PoC and updating, > > The final syntax looks good to me, at least it is a nice and concise first > step. > > SELECT f1, f2, label FROM >ML_PREDICT( > input => `my_data`, > model => `my_cat`.`my_db`.`classifier_model`, > args => DESCRIPTOR(f1, f2)); > > Besides, what built-in models will we support in the FLIP? This might be > important > because it relates to what use cases can run with the new Flink version out > of the box. > > Best, > Jark > > On Wed, 27 Mar 2024 at 01:10, Hao Li wrote: > > > Hi Timo, > > > > Yeah. For `primary key` and `from table(...)` those are explicitly > matched > > in parser: [1]. > > > > > SELECT f1, f2, label FROM > >ML_PREDICT( > > input => `my_data`, > > model => `my_cat`.`my_db`.`classifier_model`, > > args => DESCRIPTOR(f1, f2)); > > > > This named argument syntax looks good to me. It can be supported together > > with > > > > SELECT f1, f2, label FROM ML_PREDICT(`my_data`, > > `my_cat`.`my_db`.`classifier_model`,DESCRIPTOR(f1, f2)); > > > > Sure. Will let you know once updated the FLIP. > > > > [1] > > > > > https://github.com/confluentinc/flink/blob/release-1.18-confluent/flink-table/flink-sql-parser/src/main/codegen/includes/parserImpls.ftl#L814 > > > > Thanks, > > Hao > > > > On Tue, Mar 26, 2024 at 4:15 AM Timo Walther wrote: > > > > > Hi Hao, > > > > > > > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't > > > > work since `TABLE` and `MODEL` are already key words > > > > > > This argument doesn't count. The parser supports introducing keywords > > > that are still non-reserved. For example, this enables using "key" for > > > both primary key and a column name: > > > > > > CREATE TABLE t (i INT PRIMARY KEY NOT ENFORCED) > > > WITH ('connector' = 'datagen'); > > > > > > SELECT i AS key FROM t; > > > > > > I'm sure we will introduce `TABLE(my_data)` eventually as this is what > > > the standard dictates. But for now, let's use the most compact syntax > > > possible which is also in sync with Oracle. > > > > > > TLDR: We allow identifiers as arguments for PTFs which are expanded > with > > > catalog and database if necessary. Those identifier arguments translate > > > to catalog lookups for table and models. The ML_ functions will make > > > sure that the arguments are of correct type model or table. > > > > > > SELECT f1, f2, label FROM > > >ML_PREDICT( > > > input => `my_data`, > > > model => `my_cat`.`my_db`.`classifier_model`, > > > args => DESCRIPTOR(f1, f2)); > > > > > > So this will allow us to also use in the future: > > > > > > SELECT * FROM poly_func(table1); > > > > > > Same support as Oracle [1]. Very concise. > > > > > > Let me know when you updated the FLIP for a final review before voting. > > > > > > Do others have additional objections? > > > > > > Regards, > > > Timo > > > > > > [1] > > > > > > > > > https://livesql.oracle.com/apex/livesql/file/content_HQK7TYEO0NHSJCDY3LN2ERDV6.html > > > > > > > > > > > > On 25.03.24 23:40, Hao Li wrote: > > > > Hi Timo, > > > > > > > >> Please double check if this is implementable with the current > stack. I > > > > fear the parser or validator might not like the "identifier" > argument? > > > > > > > > I checked this, currently the validator throws an exception trying to > > get > > > > the full qualifier name for `classifier_model`. But since > > > > `SqlValidatorImpl` is implemented in Flink, we should be able to fix > > > this. > > > > The only caveator is if not full model path is provided, > > > > the qualifier is interpreted as a column. We should be able to > special > > > > handle this by rewriting the `ml_predict` function to add the catalog > > and > > > > database name in `FlinkCalciteSqlValidator` though. > > > > > > > >> SELECT f1, f2, label FROM >
Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL
Thanks Timo. I'll start a vote tomorrow if no further discussion. Thanks, Hao On Thu, Mar 28, 2024 at 9:33 AM Timo Walther wrote: > Hi everyone, > > I updated the FLIP according to this discussion. > > @Hao Li: Let me know if I made a mistake somewhere. I added some > additional explaning comments about the new PTF syntax. > > There are no further objections from my side. If nobody objects, Hao > feel free to start the voting tomorrow. > > Regards, > Timo > > > On 28.03.24 16:30, Jark Wu wrote: > > Thanks, Hao, > > > > Sounds good to me. > > > > Best, > > Jark > > > > On Thu, 28 Mar 2024 at 01:02, Hao Li wrote: > > > >> Hi Jark, > >> > >> I think we can start with supporting popular model providers such as > >> openai, azureml, sagemaker for remote models. > >> > >> Thanks, > >> Hao > >> > >> On Tue, Mar 26, 2024 at 8:15 PM Jark Wu wrote: > >> > >>> Thanks for the PoC and updating, > >>> > >>> The final syntax looks good to me, at least it is a nice and concise > >> first > >>> step. > >>> > >>> SELECT f1, f2, label FROM > >>> ML_PREDICT( > >>> input => `my_data`, > >>> model => `my_cat`.`my_db`.`classifier_model`, > >>> args => DESCRIPTOR(f1, f2)); > >>> > >>> Besides, what built-in models will we support in the FLIP? This might > be > >>> important > >>> because it relates to what use cases can run with the new Flink version > >> out > >>> of the box. > >>> > >>> Best, > >>> Jark > >>> > >>> On Wed, 27 Mar 2024 at 01:10, Hao Li wrote: > >>> > >>>> Hi Timo, > >>>> > >>>> Yeah. For `primary key` and `from table(...)` those are explicitly > >>> matched > >>>> in parser: [1]. > >>>> > >>>>> SELECT f1, f2, label FROM > >>>> ML_PREDICT( > >>>> input => `my_data`, > >>>> model => `my_cat`.`my_db`.`classifier_model`, > >>>> args => DESCRIPTOR(f1, f2)); > >>>> > >>>> This named argument syntax looks good to me. It can be supported > >> together > >>>> with > >>>> > >>>> SELECT f1, f2, label FROM ML_PREDICT(`my_data`, > >>>> `my_cat`.`my_db`.`classifier_model`,DESCRIPTOR(f1, f2)); > >>>> > >>>> Sure. Will let you know once updated the FLIP. > >>>> > >>>> [1] > >>>> > >>>> > >>> > >> > https://github.com/confluentinc/flink/blob/release-1.18-confluent/flink-table/flink-sql-parser/src/main/codegen/includes/parserImpls.ftl#L814 > >>>> > >>>> Thanks, > >>>> Hao > >>>> > >>>> On Tue, Mar 26, 2024 at 4:15 AM Timo Walther > >> wrote: > >>>> > >>>>> Hi Hao, > >>>>> > >>>>> > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` > >> doesn't > >>>>> > work since `TABLE` and `MODEL` are already key words > >>>>> > >>>>> This argument doesn't count. The parser supports introducing keywords > >>>>> that are still non-reserved. For example, this enables using "key" > >> for > >>>>> both primary key and a column name: > >>>>> > >>>>> CREATE TABLE t (i INT PRIMARY KEY NOT ENFORCED) > >>>>> WITH ('connector' = 'datagen'); > >>>>> > >>>>> SELECT i AS key FROM t; > >>>>> > >>>>> I'm sure we will introduce `TABLE(my_data)` eventually as this is > >> what > >>>>> the standard dictates. But for now, let's use the most compact syntax > >>>>> possible which is also in sync with Oracle. > >>>>> > >>>>> TLDR: We allow identifiers as arguments for PTFs which are expanded > >>> with > >>>>> catalog and database if necessary. Those identifier arguments > >> translate > >>>>> to catalog lookups for table and models. The ML_ functions will make > >>>>> sure that the arguments are of correct type model or table. > >>>>> > >
[VOTE] FLIP-437: Support ML Models in Flink SQL
Hi devs, I'd like to start a vote on the FLIP-437: Support ML Models in Flink SQL [1]. The discussion thread is here [2]. The vote will be open for at least 72 hours unless there is an objection or insufficient votes. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL [2] https://lists.apache.org/thread/9z94m2bv4w265xb5l2mrnh4lf9m28ccn Thanks, Hao
Re: [VOTE] FLIP-437: Support ML Models in Flink SQL
Thanks David Radley and David Moravek for the comments. I'll reply in the discussion thread. Hao On Wed, Apr 3, 2024 at 5:45 AM David Morávek wrote: > +1 (binding) > > My only suggestion would be to move Catalog changes into a separate > interface to allow us to begin with lower stability guarantees. Existing > Catalogs would be able to opt-in by implementing it. It's a minor thing > though, overall the FLIP is solid and the direction is pretty exciting. > > Best, > D. > > On Wed, Apr 3, 2024 at 2:31 AM David Radley > wrote: > > > Hi Hao, > > I don’t think this counts as an objection, I have some comments. I should > > have put this on the discussion thread earlier but have just got to this. > > - I suggest we can put a model version in the model resource. Versions > are > > notoriously difficult to add later; I don’t think we want to proliferate > > differently named models as a model mutates. We may want to work with > > non-latest models. > > - I see that the model name is the unique identifier. I realise this > would > > move away from the Oracle syntax – so may not be feasible short term; > but I > > wonder if we can have: > > - a uuid as the main identifier and the model name as an attribute. > > or > > - a namespace (or something like a system of origin) > > to help organise models with the same name. > > - does the model have an owner? I assume that Flink model resource is the > > master of the model? I imagine in the future that a model that comes in > via > > a new connector could be kept up to date with the external model and > would > > not be allowed to be changed by anything other than the connector. > > > >Kind regards, David. > > > > From: Hao Li > > Date: Friday, 29 March 2024 at 16:30 > > To: dev@flink.apache.org > > Subject: [EXTERNAL] [VOTE] FLIP-437: Support ML Models in Flink SQL > > Hi devs, > > > > I'd like to start a vote on the FLIP-437: Support ML Models in Flink > > SQL [1]. The discussion thread is here [2]. > > > > The vote will be open for at least 72 hours unless there is an objection > or > > insufficient votes. > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL > > > > [2] https://lists.apache.org/thread/9z94m2bv4w265xb5l2mrnh4lf9m28ccn > > > > Thanks, > > Hao > > > > Unless otherwise stated above: > > > > IBM United Kingdom Limited > > Registered in England and Wales with number 741598 > > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU > > >
Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL
Cross post David Radley's comments here from voting thread: > I don’t think this counts as an objection, I have some comments. I should have put this on the discussion thread earlier but have just got to this. > - I suggest we can put a model version in the model resource. Versions are notoriously difficult to add later; I don’t think we want to proliferate differently named models as a model mutates. We may want to work with non-latest models. > - I see that the model name is the unique identifier. I realise this would move away from the Oracle syntax – so may not be feasible short term; but I wonder if we can have: > - a uuid as the main identifier and the model name as an attribute. > or > - a namespace (or something like a system of origin) > to help organise models with the same name. > - does the model have an owner? I assume that Flink model resource is the master of the model? I imagine in the future that a model that comes in via a new connector could be kept up to date with the external model and would not be allowed to be changed by anything other than the connector. Thanks for the comments. I agree supporting the model version is important. I think we could support versioning without changing the overall syntax by appending version number/name to the model name. Catalog implementations can handle the versions. For example, CREATE MODEL `my-model$1`... "$1" would imply it's version 1. If no version is provided, we can auto increment the version if the model name exists already or create the first version if the model name doesn't exist yet. As for model ownership, I'm not entirely sure about the use case and how it should be controlled. It could be controlled from the user side through ACL/rbac or some way in the catalog I guess. Maybe we can follow up on this as the requirement or use case becomes more clear. Cross post David Moravek's comments from voting thread: > My only suggestion would be to move Catalog changes into a separate > interface to allow us to begin with lower stability guarantees. Existing > Catalogs would be able to opt-in by implementing it. It's a minor thing > though, overall the FLIP is solid and the direction is pretty exciting. I think it's fine to move model related catalog changes to a separate interface and let the current catalog interface extend it. As model support will be built-in in Flink, the current catalog interface will need to support model CRUD operations. For my own education, can you elaborate more on how separate interface will allow us to begin with lower stability guarantees? Thanks, Hao On Thu, Mar 28, 2024 at 10:14 AM Hao Li wrote: > Thanks Timo. I'll start a vote tomorrow if no further discussion. > > Thanks, > Hao > > On Thu, Mar 28, 2024 at 9:33 AM Timo Walther wrote: > >> Hi everyone, >> >> I updated the FLIP according to this discussion. >> >> @Hao Li: Let me know if I made a mistake somewhere. I added some >> additional explaning comments about the new PTF syntax. >> >> There are no further objections from my side. If nobody objects, Hao >> feel free to start the voting tomorrow. >> >> Regards, >> Timo >> >> >> On 28.03.24 16:30, Jark Wu wrote: >> > Thanks, Hao, >> > >> > Sounds good to me. >> > >> > Best, >> > Jark >> > >> > On Thu, 28 Mar 2024 at 01:02, Hao Li wrote: >> > >> >> Hi Jark, >> >> >> >> I think we can start with supporting popular model providers such as >> >> openai, azureml, sagemaker for remote models. >> >> >> >> Thanks, >> >> Hao >> >> >> >> On Tue, Mar 26, 2024 at 8:15 PM Jark Wu wrote: >> >> >> >>> Thanks for the PoC and updating, >> >>> >> >>> The final syntax looks good to me, at least it is a nice and concise >> >> first >> >>> step. >> >>> >> >>> SELECT f1, f2, label FROM >> >>> ML_PREDICT( >> >>> input => `my_data`, >> >>> model => `my_cat`.`my_db`.`classifier_model`, >> >>> args => DESCRIPTOR(f1, f2)); >> >>> >> >>> Besides, what built-in models will we support in the FLIP? This might >> be >> >>> important >> >>> because it relates to what use cases can run with the new Flink >> version >> >> out >> >>> of the box. >> >>> >> >>> Best, >> >>> Jark >> >>> >> >>> On Wed, 27 Mar 2024 at 01:10, Hao Li >> wrote: >> >>> >
Re: [VOTE] FLIP-437: Support ML Models in Flink SQL
Hi Dev, Thanks all for voting. I'm closing the vote and the result will be posted in a separate email. Thanks, Hao On Wed, Apr 3, 2024 at 10:24 AM Hao Li wrote: > Thanks David Radley and David Moravek for the comments. I'll reply in the > discussion thread. > > Hao > > On Wed, Apr 3, 2024 at 5:45 AM David Morávek wrote: > >> +1 (binding) >> >> My only suggestion would be to move Catalog changes into a separate >> interface to allow us to begin with lower stability guarantees. Existing >> Catalogs would be able to opt-in by implementing it. It's a minor thing >> though, overall the FLIP is solid and the direction is pretty exciting. >> >> Best, >> D. >> >> On Wed, Apr 3, 2024 at 2:31 AM David Radley >> wrote: >> >> > Hi Hao, >> > I don’t think this counts as an objection, I have some comments. I >> should >> > have put this on the discussion thread earlier but have just got to >> this. >> > - I suggest we can put a model version in the model resource. Versions >> are >> > notoriously difficult to add later; I don’t think we want to proliferate >> > differently named models as a model mutates. We may want to work with >> > non-latest models. >> > - I see that the model name is the unique identifier. I realise this >> would >> > move away from the Oracle syntax – so may not be feasible short term; >> but I >> > wonder if we can have: >> > - a uuid as the main identifier and the model name as an attribute. >> > or >> > - a namespace (or something like a system of origin) >> > to help organise models with the same name. >> > - does the model have an owner? I assume that Flink model resource is >> the >> > master of the model? I imagine in the future that a model that comes in >> via >> > a new connector could be kept up to date with the external model and >> would >> > not be allowed to be changed by anything other than the connector. >> > >> >Kind regards, David. >> > >> > From: Hao Li >> > Date: Friday, 29 March 2024 at 16:30 >> > To: dev@flink.apache.org >> > Subject: [EXTERNAL] [VOTE] FLIP-437: Support ML Models in Flink SQL >> > Hi devs, >> > >> > I'd like to start a vote on the FLIP-437: Support ML Models in Flink >> > SQL [1]. The discussion thread is here [2]. >> > >> > The vote will be open for at least 72 hours unless there is an >> objection or >> > insufficient votes. >> > >> > [1] >> > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL >> > >> > [2] https://lists.apache.org/thread/9z94m2bv4w265xb5l2mrnh4lf9m28ccn >> > >> > Thanks, >> > Hao >> > >> > Unless otherwise stated above: >> > >> > IBM United Kingdom Limited >> > Registered in England and Wales with number 741598 >> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU >> > >> >
[RESULT][VOTE] FLIP-437: Support ML Models in Flink SQL
Hi Dev, I'm happy to announce that FLIP-437: Support ML Models in Flink SQL [1] has been accepted with 7 approving votes (6 binding) [2] Timo Walther (binding) Jark Wu (binding) Yu Chen (non-binding) Piotr Nowojski (binding) Leonard Xu (binding) Martijn Visser (binding) David Moravek (binding) Thanks, Hao [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL [2] https://lists.apache.org/thread/gw1hfnqb05mwrstbtw43yh5tllrscgn6
Re: [DISCUSS] FLIP:507 Add Model DDL methods in TABLE API
Hi Yash, +1 for the proposal. Thanks, Hao On Mon, Feb 10, 2025 at 12:31 AM Yanquan Lv wrote: > Hi, Yash. Thanks for driving it. > +1 for this. > > > 2025年2月7日 05:28,Yash Anand 写道: > > > > Hi all! I would like to open up for discussion a new FLIP-507[1]. > > Motivation This proposal aims to extend model DDL support to the Table > API, > > enabling a seamless development experience where users can define, > manage, > > and utilize ML models entirely within the Table API ecosystem. > > Other Model Functions like ml_predict() will be added in the follow up > > FLIP(s). > > > > [1] https://cwiki.apache.org/confluence/x/fIxEF > > > > > > Best regards, > > Yash Anand > >
Re: [DISCUSS] FLIP-517: Better Handling of Dynamic Table Primitives with PTFs
Thanks Timo for the FLIP! This is a great improvement to the FLINK sql syntax around tables. I have two clarification questions: 1. For SEARCH_KEY ``` SELECT * FROM t_other, LATERAL SEARCH_KEY( input => t, on_key => DESCRIPTOR(k), lookup => t_other.name, options => MAP[ 'async', 'true', 'retry-predicate', 'lookup_miss', 'retry-strategy', 'fixed_delay', 'fixed-delay'='10s' ] ) ``` Table `t` needs to be an existing `LookupTableSource` [1], right? And we will rewrite it to `StreamPhysicalLookupJoin` [2] or similar operator during the physical optimization phase. Also to support passing options, we need to extend `LookupContext` [3] to have a `getOptions` or `getRuntimeOptions` method? 2. For FROM_CHANGELOG ``` SELECT * FROM FROM_CHANGELOG(s) AS t; ``` Do we need to introduce a `DataStream` resource in sql first? Hao [1] https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java [2] https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/nodes/physical/stream/StreamPhysicalLookupJoin.scala#L41 [3] https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java#L82 On Fri, Mar 21, 2025 at 6:25 AM Timo Walther wrote: > Hi everyone, > > I would like to start a discussion about FLIP-517: Better Handling of > Dynamic Table Primitives with PTFs [1]. > > In the past months, I have spent a significant amount of time with SQL > semantics and the SQL standard around PTFs, when designing and > implementing FLIP-440 [2]. For those of you that have not taken a look > into the standard, the concept of Polymorphic Table Functions (PTF) > enables syntax for implementing custom SQL operators. In my opinion, > they are kind of a revolution in the SQL language. PTFs can take scalar > values, tables, models (in Flink), and column lists as arguments. With > these primitives, we can further evolve shortcomings in the Flink SQL > language by leveraging syntax and semantics. > > I would like introduce a couple of built-in PTFs with the goal to make > the handling of dynamic tables easier for users. Once users understand > how a PTF works, they can easily select from a list of functions to > approach a table for snapshots, changelogs, or searching. > > The FLIP proposes: > > SNAPSHOT() > SEARCH_KEY() > TO_CHANGELOG() > FROM_CHANGELOG() > > I'm aware that this is a delicate topic, and might lead to controversial > discussions. I hope with concise naming and syntax the benefit over the > existing syntax becomes clear. > > There are more useful PTFs to come, but those are the ones that I > currently see as the most fundamental ones to tell a round story around > Flink SQL. > > Looking forward to your feedback. > > Thanks, > Timo > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-517%3A+Better+Handling+of+Dynamic+Table+Primitives+with+PTFs > [2] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093 >
[DISCUSS] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design
Hi All, I would like to start a discussion about FLIP-525 [1]: Model ML_PREDICT, ML_EVALUATE Implementation Design. This FLIP is co-authored with Shengkai Fang. This FLIP is a follow up of FLIP-437 [2] to propose the implementation design for ML_PREDICT and ML_EVALUATE function which were introduced in FLIP-437. For more details, see FLIP-525 [1]. Looking forward to your feedback. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL Thanks, Hao
[DISCUSS] FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API
Hi All, I would like to start a discussion about FLIP-526 [1]: Model ML_PREDICT, ML_EVALUATE Table API. This FLIP is a follow up of FLIP-507 [2] to propose the table api for model related functions. This FLIP is also closely related to FLIP-525 [3] which is the proposal for model related function implementation design. For more details, see FLIP-526 [1]. Looking forward to your feedback. Thanks, Hao [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-507%3A+Add+Model+DDL+methods+in+TABLE+API [3] https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
Re: [DISCUSS] FLIP-520: Simplify StructuredType handling
Hi Timo, Thanks for the FLIP. +1 with a few questions: 1. Can `StructuredType` be nested? e.g. `STRUCTURED<'com.example.User', name STRING, age INT NOT NULL, address STRUCTURED<'com.example.address', street STRING, zip STRING>>` 2. What's the main reason the class won't be enforced in SQL? Since tables created in SQL can also be used in Table API, will it come as a surprise if it's working in SQL and then failing in Table API? What if `com.example.User` was not validated in SQL when creating table, then the class was created for something else with different fields and then in Table api, it's not compatible. Hao On Tue, Apr 22, 2025 at 9:39 AM Timo Walther wrote: > Hi Arvid, Hi Sergey, > > thanks for your feedback. I updated the FLIP accordingly but let me > answer your questions > here as well: > > > Are we going to enforce that the name is a valid class name? What is > > happening if it's not a correct name? > > What are the implications of using a class that is not in the > > classpath in Table API? It looks to me that the name is metadata-only > > until we try to access the objects directly in Table/DataStream API. > > Names are not enforced or validated. They are pure metadata as mentioned > in Section 2.1. We fallback to Row as the conversion class if the name > cannot be resolved in the current classpath. So when staying in the SQL > ecosystem (i.e. not switching to Table API, DataStream API, or UDFs), > the class must not be present. > > > Should Expressions.objectOf(String, Object... kv); also have an > > overload where you can put in the StructuredType in case where > > the class is not in the CP? > > That makes a lot of sense. I added a DataTypes.STRUCTURED(String, > Field...) method and a Expressions.objectOf(String, Object...). > > > What is the expected outcome of supplying fewer keys than defined > > in the structured type? Are we going to make use of nullability here? > > If so, *_INSERT and *_REMOVE may have some use. > > Currently, we go with the most conservative approach, which means that > all keys need to be present. Maybe we can reserve this feature to future > work and make the logic more lenient. > > > Talking about nullability: Is there some option to make the declared > > fields NOT NULL? If so, could you amend one example to show that? > > (Grammar? implies that it's not possible) > > NOT NULL is supported similar to ROW. I adjusted one of > the examples. > > > One bigger concern is around the naming. For me, OBJECT is used for > > semi-structured types that are open. Your FLIP implies a closed design > > and that you want to add an open OBJECT later. I asked ChatGPT about > > other DB implementations and it seems like STRUCT is used more often > > (see below). So, I'd propose to call it STRUCT<...>, STRUCT_OF, > > > structOf, UPDATE_STRUCT, and updateStruct respectively. > > Naming is hard. I was also torn between STRUCT, STRUCTURED, or OBJECT. > In Flink, the ROW type is rather our STRUCT type, because it works fully > position based. Structured types might be name-based in the future for > better schema evolution, so they rather model an OBJECT type. This was > my reason for choosing OBJECT_OF (typed to class name and fixed fields) > vs. OBJECT (semi-structured without fixed fields). Snowflake also uses > OBJECT(i INT) (for structured types) and OBJECT (for semi structured > types). > > Also, both structured and semi-structured types can then share functions > such as UPDATE_OBJECT(). > > What do others think? > > Thanks, > Timo > > On 22.04.25 12:08, Sergey Nuyanzin wrote: > > Thanks for driving this Timo > > > > The FLIP seems reasonable to me > > > > I have one minor question/clarification > > do I understand it correct that after this FLIP we can execute of > > `typeof` against result of `OBJECT_OF` > > for instance > > SELECT typeof(OBJECT_OF( > >'com.example.User', > >'name', 'Bob', > >'age', 42 > > )); > > > > should return `STRUCTURED<'com.example.User', name STRING, age INT>` > > ? > > > > On Tue, Apr 22, 2025 at 10:57 AM Timo Walther > wrote: > >> > >> Hi everyone, > >> > >> I would like to ask again for feedback on this FLIP. It is a rather > >> small change but with big impact on usability for structured data. > >> > >> Are there any objections? Otherwise I would like to continue with voting > >> soon. > >> > >> Thanks, > >> Timo > >> > >> On 10.04.25 07:54, Timo Walther wrote: > >>> Hi everyone, > >>> > >>> I would like to start a discussion about FLIP-520: Simplify > >>> StructuredType handling [1]. > >>> > >>> Flink SQL already supports structured types in the engine, serializers, > >>> UDFs, and connector interfaces. However, currently only Table API was > >>> able to make use of them. While UDFs can take objects as input and > >>> return types, it is actually quite inconvenient to use them in > >>> transformations. > >>> > >>> This FLIP fixes some immediate blockers in the use of structured types. > >>> > >>> Looking f
Re: [DISCUSS] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design
Hi Yash, ML_EVALUATE itself will be an `TableAggregateFunction`. We will only provide one implementation in Flink which will be used in codegen. Only ML_PREDICT function implementation can be based on providers. Flink will also provide a default implementation for it. Thanks, Hao On Mon, May 5, 2025 at 8:22 AM Yash Anand wrote: > Hi Hao, > > Thanks for the proposal, these are really interesting features to extend > Flink ML use case. > > +1 for the proposal. > > I just have one question, since you plan to extend > SqlMlFunctionTableFunction for both ML functions builtin registrations, > will ML_EVALUATE be an aggregate function or Table function? > > Thanks, > Yash Anand > > On Mon, May 5, 2025 at 4:18 AM Piotr Nowojski > wrote: > > > Hi, > > > > sounds like an interesting feature! > > > > Best, > > Piotrek > > > > wt., 29 kwi 2025 o 03:52 Shengkai Fang napisał(a): > > > > > Hi, Hao. > > > > > > Thanks for your proposal about ML related functions. This FLIP will > help > > > others to implement their own model provider. > > > > > > +1 for the proposal. > > > > > > Best, > > > Shengkai > > > > > > Hao Li 于2025年4月29日周二 07:22写道: > > > > > > > Hi All, > > > > > > > > I would like to start a discussion about FLIP-525 [1]: Model > > ML_PREDICT, > > > > ML_EVALUATE Implementation Design. This FLIP is co-authored with > > Shengkai > > > > Fang. > > > > > > > > This FLIP is a follow up of FLIP-437 [2] to propose the > implementation > > > > design for ML_PREDICT and ML_EVALUATE function which were introduced > in > > > > FLIP-437. > > > > > > > > For more details, see FLIP-525 [1]. Looking forward to your feedback. > > > > > > > > > > > > [1] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design > > > > [2] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL > > > > > > > > > > > > Thanks, > > > > Hao > > > > > > > > > >
Re: [VOTE] FLIP-520: Simplify StructuredType handling
+1 (non-binding) Thanks for driving this! Hao On Tue, May 6, 2025 at 8:27 AM Ferenc Csaky wrote: > +1 (binding) > > Thanks for driving this! > > Best, > Ferenc > > > > > On Tuesday, May 6th, 2025 at 14:50, Sergey Nuyanzin > wrote: > > > > > > > Thank you for driving this > > +1 (binding) > > > > On Tue, May 6, 2025, 12:04 Timo Walther twal...@apache.org wrote: > > > > > Hi everyone, > > > > > > I'd like to start a vote on FLIP-520: Simplify StructuredType handling > > > [1] which has been discussed in this thread [2]. > > > > > > The vote will be open for at least 72 hours unless there is an > objection > > > or not enough votes. > > > > > > [1] > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-520%3A+Simplify+StructuredType+handling > > > [2] https://lists.apache.org/thread/mjxypb67bonyv2jf6vq4z6ttnwwxkz9c > > > > > > Cheers, > > > Timo >
Re: [DISCUSS] FLIP-520: Simplify StructuredType handling
I think Arvid has a good point. Why not define Object type without class and when you get it in table api, try to cast it to some class? I found https://docs.oracle.com/javase/1.5.0/docs/guide/jdbc/getstart/mapping.html. Under `JAVA_OBJECT` type section. They have: ``` ResultSet rs = stmt.executeQuery("SELECT ENGINEERS FROM PERSONNEL"); while (rs.next()) { Engineer eng = (Engineer)rs.getObject("ENGINEERS"); System.out.println(eng.lastName + ", " + eng.firstName); } ``` For us, how about add `getFieldAs(int post, Class class)` method in Row type? Your example: ``` TableEnvironment env = ... Table t = env.sqlQuery("SELECT OBJECT_OF('com.example.User', 'name', 'Bob', 'age', 42)"); // Tries to resolve `com.example.User` in the classpath, if not present returns `Row` t.execute().collect(); ``` Will be ``` TableEnvironment env = ... Table t = env.sqlQuery("SELECT OBJECT_OF('name', 'Bob', 'age', 42)"); // Tries to resolve `com.example.User` in the classpath, if not present returns `Row` For (Row row : t.execute().collect()) { User user = row.getFieldAs(0, User.class); } ``` For Arvid's question: "However, at that point, why do we actually need anything beyond ROW?" Maybe the difference is Row type shouldn't support to be casted as user defined class but `StructuredType` can be. Thanks, Hao On Wed, Apr 23, 2025 at 2:04 AM Arvid Heise wrote: > Hi Timo, > > thanks for addressing my points. I'm not set on using STRUCT et al. but > wanted to point out the alternatives. > > Regarding the attached class name, I have similar confusion to Hao. I > wonder if Structures types shouldn't be anonymous by default in the sense > that initially we don't attach a class name to it. As you pointed out, it > has no real semantics in SQL and we can't validate it. > Another thing to consider is that if one user creates a table through some > means and another user wants to consume it, the second user may not have > access to the class as is. But the user could easily create a compatible > class on its own. > > Consequently, I'm thinking about getting rid of the type at all. Only on > the edges, we can use conversion to the user types when users actually > access the ROW: > * Any table API access that wants to collect results (in your last example > what is t.execute().collect(); returning? How does that work in the > multi-user setup sketched above? Wouldn't it be easier that the consumer > explicitly gives us the POJO type that it expects?) > * Any DataStream conversion > * Any UDF > > However, at that point, why do we actually need anything beyond ROW? > > Best, > > Arvid > > On Wed, Apr 23, 2025 at 8:52 AM Timo Walther wrote: > > > Hi Hao, > > > > 1. Can `StructuredType` be nested? > > > > Yes this is supported. > > > > 2. What's the main reason the class won't be enforced in SQL? > > > > SQL should not care about classes. Within the SQL ecosystem, the SQL > > engine controls the data serialization and protocols. The SQL engine > > will not load the class. Classes are a concept of a JVM or Python API > > endpoint. This also the reason why a SQL ARRAY can be > > represented as List, long[], Long[]. The latter are only concepts > > in the target programming language and might look different in Python. > > > > Regard, > > Timo > > > > > > On 22.04.25 23:54, Hao Li wrote: > > > Hi Timo, > > > > > > Thanks for the FLIP. +1 with a few questions: > > > > > > 1. Can `StructuredType` be nested? e.g. `STRUCTURED<'com.example.User', > > > name STRING, age INT NOT NULL, address > STRUCTURED<'com.example.address', > > > street STRING, zip STRING>>` > > > > > > 2. What's the main reason the class won't be enforced in SQL? Since > > tables > > > created in SQL can also be used in Table API, will it come as a > surprise > > if > > > it's working in SQL and then failing in Table API? What if > > > `com.example.User` was not validated in SQL when creating table, then > the > > > class was created for something else with different fields and then in > > > Table api, it's not compatible. > > > > > > Hao > > > > > > On Tue, Apr 22, 2025 at 9:39 AM Timo Walther > wrote: > > > > > >> Hi Arvid, Hi Sergey, > > >> > > >> thanks for your feedback. I updated the FLIP accordingly but let me > > >> answer your questions > > >> here as wel
Re: [DISCUSS] FLIP-520: Simplify StructuredType handling
Hi Timo, Thanks for the clarification. It's very helpful. For the classpath, I suppose it can also support Python later if it's called in Python table api? Do we want to indicate if it's Java classpath or Python class? Or we support a list of classpath which can consist both Python, Java or other languages later? Thanks, Hao On Thu, Apr 24, 2025 at 5:34 AM Arvid Heise wrote: > Hi Timo, > > thank you very much for responding. I see that this is just the first step > to get consistency between SQL and Table API and more work is to come. > > I still think that there is some redundancy between STRUCT and ROW but tbh > I have more issues with ROW than with STRUCT. (What is even the meaning of > a nested ROW?) > > So +1 with your proposal and maybe we can deprecate ROW at some later point > in time. > > Best, > > Arvid > > On Thu, Apr 24, 2025 at 11:57 AM Timo Walther wrote: > > > Hi Arvid, Hi Hao, > > > > thanks for this valuable feedback. Let me clarify a few things before I > > go into the details. > > > > Just to avoid any confusion: the FLIP does not propose introducing the > > StructuredType. Structured types backed by classes already exist in > > Flink for years and are already supported in UDFs, Table.collect(), > > StreamTableEnvironment.toDataStream, and connectors. Structured types > > have been introduced for a better programmatic story in Table API. They > > avoid the need for manually defining the full schema at the edges. > > Manual schema work is annoying and with structured types it is possible > > to use classes whereever a type is expected. > > > > The goal of this FLIP only to bring Table API and SQL closer together. > > In general, this is only the first step of my larger vision of > > structured data handling. There are basically 3 kinds of structured > types: > > > > 1) a typed, fixed field struct like STRUCTURED<'Money', i INT, s STRING> > > 2) an untyped, fixed field struct like STRUCTURED > > (similar to Snowflake OBJECT(i INT, s STRING)) > > 3) an untyped struct for semi-structured data like STRUCTURED (similar > > to Snowflake OBJECT) > > > > RowType represents 2), StructuredType represents 1) and a future > > semi-structured type can represent 3) (but out of scope for this FLIP). > > > > If we don't support a typed struct, Money(i INT) and User(i INT) are not > > distinct in SQL. For table.collect() or eval(Row row) in UDFs, it would > > mean that those need the full schema declaration in order to map to a > > target type. Structured types avoid all of that and make Table API very > > powerful. > > > > Usually both the UDF and the collect()/toDataStream() are defined in the > > same Table API program. Thus, the class is usually present in the same > > classpath and this becomes less of an issue in production. Casting > > structured types to ROW is also supported. > > > > The implementation effort of this FLIP is very low. It's mostly intended > > to fill missing gaps, no major overhaul of the type system. Also to > > avoid any backwards compatibility issues. > > > > Let me know what you think. > > > > Cheers, > > Timo > > > > On 23.04.25 21:27, Hao Li wrote: > > > I think Arvid has a good point. Why not define Object type without > class > > > and when you get it in table api, try to cast it to some class? I found > > > > > > https://docs.oracle.com/javase/1.5.0/docs/guide/jdbc/getstart/mapping.html > > . > > > Under `JAVA_OBJECT` type section. They have: > > > > > > ``` > > > > > > ResultSet rs = stmt.executeQuery("SELECT ENGINEERS FROM PERSONNEL"); > > > while (rs.next()) { > > > Engineer eng = (Engineer)rs.getObject("ENGINEERS"); > > > System.out.println(eng.lastName + ", " + eng.firstName); > > > } > > > > > > ``` > > > > > > For us, how about add `getFieldAs(int post, Class class)` method in Row > > > type? Your example: > > > > > > ``` > > > > > > TableEnvironment env = ... > > > > > > Table t = env.sqlQuery("SELECT OBJECT_OF('com.example.User', 'name', > > 'Bob', > > > 'age', 42)"); > > > > > > // Tries to resolve `com.example.User` in the classpath, if not present > > > returns `Row` > > > t.execute().collect(); > > > ``` > > > > > > Will be > > > ``` > > > TableEnvironment env = ... > &g
Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API
Thanks Mayank for the proposal. I think it's a great addition to Flink to define secure connectivity in general for table, model and other resources later on. +1. Hao On Fri, May 2, 2025 at 5:32 AM Gustavo de Morais wrote: > Hi Mayank, > > Thanks for the initiative. Looking at the FLIP, this looks like a > well-thought-out proposal that addresses a clear need for more secure and > reusable external connections in Flink SQL and Table API. Separating > connection details would a valuable improvement. > > Best Regards, > Gustavo > > Am Fr., 2. Mai 2025 um 07:12 Uhr schrieb Ferenc Csaky > : > > > Hi Mayank, > > > > Thank you for starting the discussion! In general, I think such > > functionality > > would be a really great addition to Flink. > > > > Could you pls. elaborate a bit more one what is the reason of defining a > > `connection` resource on the database level instead of the catalog level? > > If I think about `JdbcCatalog`, or `HiveCatalog`, the catalog is in > 1-to-1 > > mapping with an RDBMS, or a HiveMetastore, so my initial thinking is > that a > > `connection` seems more like a catalog level resource. > > > > WDYT? > > > > Thanks, > > Ferenc > > > > > > > > On Tuesday, April 29th, 2025 at 17:08, Mayank Juneja < > > mayankjunej...@gmail.com> wrote: > > > > > > > > > > > Hi all, > > > > > > I would like to open up for discussion a new FLIP-529 [1]. > > > > > > Motivation: > > > Currently, Flink SQL handles external connectivity by defining > endpoints > > > and credentials in table configuration. This approach prevents > > reusability > > > of these connections and makes table definition less secure by exposing > > > sensitive information. > > > We propose the introduction of a new "connection" resource in Flink. > This > > > will be a pluggable resource configured with a remote endpoint and > > > associated access key. Once defined, connections can be reused across > > table > > > definitions, and eventually for model definition (as discussed in > > FLIP-437) > > > for inference, enabling seamless and secure integration with external > > > systems. > > > The connection resource will provide a new, optional way to manage > > external > > > connectivity in Flink. Existing methods for table definitions will > remain > > > unchanged. > > > > > > [1] https://cwiki.apache.org/confluence/x/cYroF > > > > > > Best Regards, > > > Mayank Juneja > > >
Re: [VOTE] FLIP-507: Add Model DDL methods in TABLE API
+1 (non-binding) Thanks Yash, Hao On Tue, Feb 18, 2025 at 10:46 AM Yash Anand wrote: > Hi Everyone, > > I'd like to start a vote on FLIP-507: Add Model DDL methods in TABLE API > [1] which has been discussed in this thread [2]. > > The vote will be open for at least 72 hours unless there is an objection or > not enough votes. > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-507%3A+Add+Model+DDL+methods+in+TABLE+API > [2] https://lists.apache.org/thread/w9dt6y1w0yns5j3g4685tstjdg5flvy9 >
Re: [DISCUSS] FLIP-517: Better Handling of Dynamic Table Primitives with PTFs
Hi Timo, Any question I have is what's the SEARCH_KEY result schema you have in mind? Can it output multiple rows for every row in the left table or it needs to pack the result in a single row as an array? Thanks, Hao On Mon, Mar 24, 2025 at 10:20 AM Hao Li wrote: > Thanks Timo for the FLIP! This is a great improvement to the FLINK sql > syntax around tables. I have two clarification questions: > > 1. For SEARCH_KEY > ``` > SELECT * > FROM > t_other, > LATERAL SEARCH_KEY( > input => t, > on_key => DESCRIPTOR(k), > lookup => t_other.name, > options => MAP[ > 'async', 'true', > 'retry-predicate', 'lookup_miss', > 'retry-strategy', 'fixed_delay', > 'fixed-delay'='10s' > ] > ) > ``` > Table `t` needs to be an existing `LookupTableSource` [1], right? And we > will rewrite it to `StreamPhysicalLookupJoin` [2] or similar operator > during the physical optimization phase. > Also to support passing options, we need to extend `LookupContext` [3] to > have a `getOptions` or `getRuntimeOptions` method? > > 2. For FROM_CHANGELOG > ``` > SELECT * FROM FROM_CHANGELOG(s) AS t; > ``` > Do we need to introduce a `DataStream` resource in sql first? > > > Hao > > > > > [1] > > https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java > [2] > > https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/nodes/physical/stream/StreamPhysicalLookupJoin.scala#L41 > [3] > > https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java#L82 > > > On Fri, Mar 21, 2025 at 6:25 AM Timo Walther wrote: > >> Hi everyone, >> >> I would like to start a discussion about FLIP-517: Better Handling of >> Dynamic Table Primitives with PTFs [1]. >> >> In the past months, I have spent a significant amount of time with SQL >> semantics and the SQL standard around PTFs, when designing and >> implementing FLIP-440 [2]. For those of you that have not taken a look >> into the standard, the concept of Polymorphic Table Functions (PTF) >> enables syntax for implementing custom SQL operators. In my opinion, >> they are kind of a revolution in the SQL language. PTFs can take scalar >> values, tables, models (in Flink), and column lists as arguments. With >> these primitives, we can further evolve shortcomings in the Flink SQL >> language by leveraging syntax and semantics. >> >> I would like introduce a couple of built-in PTFs with the goal to make >> the handling of dynamic tables easier for users. Once users understand >> how a PTF works, they can easily select from a list of functions to >> approach a table for snapshots, changelogs, or searching. >> >> The FLIP proposes: >> >> SNAPSHOT() >> SEARCH_KEY() >> TO_CHANGELOG() >> FROM_CHANGELOG() >> >> I'm aware that this is a delicate topic, and might lead to controversial >> discussions. I hope with concise naming and syntax the benefit over the >> existing syntax becomes clear. >> >> There are more useful PTFs to come, but those are the ones that I >> currently see as the most fundamental ones to tell a round story around >> Flink SQL. >> >> Looking forward to your feedback. >> >> Thanks, >> Timo >> >> [1] >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-517%3A+Better+Handling+of+Dynamic+Table+Primitives+with+PTFs >> [2] >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093 >> >
Re: [DISCUSS] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design
Hi Ron, I found these names in other systems: `task_type` in big query ML [1] `model_type` in databricks [2] `task` is more of an abbreviated version from `task_type`. [1] https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate [2] https://www.databricks.com/blog/2022/04/19/model-evaluation-in-mlflow.html Thanks, Hao On Tue, May 6, 2025 at 10:55 PM Ron Liu wrote: > > It's mainly used for model evaluation purposes for `ML_EVALUATE`. > Different > loss functions will be used and different metrics will be output for > `ML_EVALUATE` based on the task option of the model. Task option is not > necessary if the model > is not used in `ML_EVALUATE`. `ML_EVALUATE` also has an overloading method > which can override the task type during evaluation. > > From your explanation, I personally feel that it might be more appropriate > to replace task with a word more suited to the scenario, but of course I > don't have a good suggestion at the moment, just a suggestion. > > Best, > Ron > > Hao Li 于2025年5月7日周三 11:24写道: > > > Hi Yunfeng, Ron, > > > > Thanks for the feedback. > > > > > it might be better to change the configuration api_key to apikey > > Make sense. I updated the FLIP. > > > > > Why is it necessary to define the task option in the WITH clause of the > > Model DDL, and what is its purpose? > > It's mainly used for model evaluation purposes for `ML_EVALUATE`. > Different > > loss functions will be used and different metrics will be output for > > `ML_EVALUATE` based on the task option of the model. Task option is not > > necessary if the model > > is not used in `ML_EVALUATE`. `ML_EVALUATE` also has an overloading > method > > which can override the task type during evaluation. > > > > Apart from evaluation, in the future, if model training is supported in > > Flink, it can also serve the purpose of how the model can be trained. > > > > > About the CatalogModel interface, why does it need `getInputSchema` and > > `getOutputSchema` methods? What is the role of Schema? > > Schema is mainly to specific the input and output data type of the model > > when it's used in prediction. During prediction, `ML_PREDICT` takes > columns > > from the input table matching the models input schema types and output > > columns based on the model's output schema type. > > > > > Regarding the ModelProvider interface, what is the role of the copy > > method? > > I think it can be useful in the future if we need to copy it during the > > planning stage and apply mutations to the provider. But it may not be > used > > for now. I'm also ok to remove it. > > > > > > Hope this answers your question. > > > > Thanks, > > Hao > > > > > > On Tue, May 6, 2025 at 7:49 PM Ron Liu wrote: > > > > > Hi, Hao > > > > > > Thanks for starting this proposal, it's a great feature, +1. > > > > > > Since I was missing some context, I went to FLIP-437. Combining these > two > > > FLIPs, I have the following three questions: > > > 1. Why is it necessary to define the task option in the WITH clause of > > the > > > Model DDL, and what is its purpose? I understand that one model can > > support > > > various types of tasks such as regression, classification, clustering, > > > etc... But the example you have given gives me the impression that > model > > > can only perform a specific type of task, which confuses me. I think > the > > > task option is not needed > > > > > > 2. About the CatalogModel interface, why does it need `getInputSchema` > > and > > > `getOutputSchema` method, What is the role of Schema? > > > > > > 3. Regarding the ModelProvider interface, what is the role of the copy > > > method? Since I don't know much about the implementation details, I'm > > > curious about what cases need to be copied. > > > > > > > > > Best, > > > Ron > > > > > > Yunfeng Zhou 于2025年5月7日周三 09:33写道: > > > > > > > Hi Hao, > > > > > > > > Thanks for the FLIP! It provides a clearer guideline for developers > to > > > > implement model functions. > > > > > > > > One minor comment: it might be better to change the configuration > > api_key > > > > to apikey, which corresponds to GlobalConfiguration.SENSITIVE_KEYS. > > > > Otherwise users’ secrets might be exposed in logs and cause security > > &g
Re: [VOTE] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design
Hi Dev, Thanks all for voting. I'm closing the vote and the result will be posted in a separate email. Thanks, Hao On Fri, May 9, 2025 at 12:43 AM Martijn Visser wrote: > +1 (binding) > > On Fri, May 9, 2025 at 9:27 AM Yunfeng Zhou > wrote: > > > +1(non-binding) > > > > Best, > > Yunfeng > > > > > 2025年5月9日 00:01,Hao Li 写道: > > > > > > Hi everyone, > > > > > > I'd like to start a vote on FLIP-525: Model ML_PREDICT, ML_EVALUATE > > > Implementation Design [1], which has been discussed in this thread [2]. > > > > > > The vote will be open for at least 72 hours unless there is an > objection > > > or not enough votes. > > > > > > Thanks, > > > Hao > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design > > > [2] https://lists.apache.org/thread/880cxnw9ygsjl3x5w0kxrh4x5fw6mp4x > > > > >
[RESULT][VOTE] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design
Hi Dev, I'm happy to announce that FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design [1] has been accepted with 6 approving votes (3 binding) [2] Yash Anand (non-binding) Mayank Juneja (non-binding) Shengkai Fang (binding) Ron Liu (binding) Yunfeng Zhou (non-binding) Martijn Visser (binding) Thanks, Hao [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design [2] https://lists.apache.org/thread/by9mpnmk22nws1sk54z7t69802ws2z55
Re: [VOTE] FLIP-516: Multi-Way Join Operator
+1 (non-binding) Thanks, Hao On Fri, May 9, 2025 at 3:41 AM Arvid Heise wrote: > +1 (binding) > > Cheers > > On Wed, May 7, 2025 at 6:37 PM Gustavo de Morais > wrote: > > > Hi everyone, > > > > I'd like to start voting on FLIP-516: Multi-Way Join Operator [1]. The > > discussion can be found in this thread [2]. > > The vote will be open for at least 72 hours, unless there is an objection > > or not enough votes. > > > > Thanks, > > > > Gustavo de Morais > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-516%3A+Multi-Way+Join+Operator > > > > [2] https://lists.apache.org/thread/9b9cy8o2qjt7w2n7j0g4bbrwvy9n61xv > > >
[VOTE] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design
Hi everyone, I'd like to start a vote on FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design [1], which has been discussed in this thread [2]. The vote will be open for at least 72 hours unless there is an objection or not enough votes. Thanks, Hao [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design [2] https://lists.apache.org/thread/880cxnw9ygsjl3x5w0kxrh4x5fw6mp4x
Re: [DISCUSS] FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API
Hi Ron, > whether the predict or evaluate introduced in this FLIP can be serialized to SQL? Yes. It can be serialized to SQL. We just need to provide serialization function in `ModelReferenceExpression` and `ModelSourceQueryOperation`. Thanks, Hao On Thu, May 8, 2025 at 12:09 AM Ron Liu wrote: > > Hi, Hao > > +1 for the proposal. > > Due to [1][2] having supported serialize table API to SQL, so I've one > question: whether the predict or evaluate introduced in this FLIP can be > serialized to SQL? > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-393%3A+Make+QueryOperations+SQL+serializable > [2] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-502%3A+QueryOperation+SQL+Serialization+customization > > > Kindly reminder: The FLIP link [1] is error in your first email. > > Best, > Ron > > Yash Anand 于2025年5月5日周一 23:24写道: > > > Hi, Hao. > > > > +1 for the proposal. > > > > Thanks > > Yas > > > > On Mon, May 5, 2025 at 4:17 AM Piotr Nowojski > > wrote: > > > > > Hi, > > > > > > sounds like a valuable addition! > > > > > > Best, > > > Piotrek > > > > > > wt., 29 kwi 2025 o 03:57 Shengkai Fang napisał(a): > > > > > > > Hi, Hao. > > > > > > > > +1 for the proposal. > > > > > > > > Best, > > > > Shengkai > > > > > > > > Hao Li 于2025年4月29日周二 07:27写道: > > > > > > > > > Hi All, > > > > > > > > > > I would like to start a discussion about FLIP-526 [1]: Model > > > ML_PREDICT, > > > > > ML_EVALUATE Table API. > > > > > > > > > > This FLIP is a follow up of FLIP-507 [2] to propose the table api for > > > > model > > > > > related functions. This FLIP is also closely related to FLIP-525 [3] > > > > which > > > > > is the proposal for model related function implementation design. > > > > > > > > > > For more details, see FLIP-526 [1]. Looking forward to your feedback. > > > > > > > > > > Thanks, > > > > > Hao > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design > > > > > [2] > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-507%3A+Add+Model+DDL+methods+in+TABLE+API > > > > > [3] > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design > > > > > > > > > > > > > >
[VOTE] FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API
Hi everyone, I'd like to start a vote on FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API [1], which has been discussed in this thread [2]. The vote will be open for at least 72 business hours unless there is an objection or not enough votes. Thanks, Hao [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-526%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Table+API [2] https://lists.apache.org/thread/hlmm9l6qr3pdhs8oc8ltn4pf0s6tg3x4
Re: [DISCUSS] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design
Hi Yunfeng, Ron, Thanks for the feedback. > it might be better to change the configuration api_key to apikey Make sense. I updated the FLIP. > Why is it necessary to define the task option in the WITH clause of the Model DDL, and what is its purpose? It's mainly used for model evaluation purposes for `ML_EVALUATE`. Different loss functions will be used and different metrics will be output for `ML_EVALUATE` based on the task option of the model. Task option is not necessary if the model is not used in `ML_EVALUATE`. `ML_EVALUATE` also has an overloading method which can override the task type during evaluation. Apart from evaluation, in the future, if model training is supported in Flink, it can also serve the purpose of how the model can be trained. > About the CatalogModel interface, why does it need `getInputSchema` and `getOutputSchema` methods? What is the role of Schema? Schema is mainly to specific the input and output data type of the model when it's used in prediction. During prediction, `ML_PREDICT` takes columns from the input table matching the models input schema types and output columns based on the model's output schema type. > Regarding the ModelProvider interface, what is the role of the copy method? I think it can be useful in the future if we need to copy it during the planning stage and apply mutations to the provider. But it may not be used for now. I'm also ok to remove it. Hope this answers your question. Thanks, Hao On Tue, May 6, 2025 at 7:49 PM Ron Liu wrote: > Hi, Hao > > Thanks for starting this proposal, it's a great feature, +1. > > Since I was missing some context, I went to FLIP-437. Combining these two > FLIPs, I have the following three questions: > 1. Why is it necessary to define the task option in the WITH clause of the > Model DDL, and what is its purpose? I understand that one model can support > various types of tasks such as regression, classification, clustering, > etc... But the example you have given gives me the impression that model > can only perform a specific type of task, which confuses me. I think the > task option is not needed > > 2. About the CatalogModel interface, why does it need `getInputSchema` and > `getOutputSchema` method, What is the role of Schema? > > 3. Regarding the ModelProvider interface, what is the role of the copy > method? Since I don't know much about the implementation details, I'm > curious about what cases need to be copied. > > > Best, > Ron > > Yunfeng Zhou 于2025年5月7日周三 09:33写道: > > > Hi Hao, > > > > Thanks for the FLIP! It provides a clearer guideline for developers to > > implement model functions. > > > > One minor comment: it might be better to change the configuration api_key > > to apikey, which corresponds to GlobalConfiguration.SENSITIVE_KEYS. > > Otherwise users’ secrets might be exposed in logs and cause security > risks. > > > > Best, > > Yunfeng > > > > > > > 2025年4月29日 07:22,Hao Li 写道: > > > > > > Hi All, > > > > > > I would like to start a discussion about FLIP-525 [1]: Model > ML_PREDICT, > > > ML_EVALUATE Implementation Design. This FLIP is co-authored with > Shengkai > > > Fang. > > > > > > This FLIP is a follow up of FLIP-437 [2] to propose the implementation > > > design for ML_PREDICT and ML_EVALUATE function which were introduced in > > > FLIP-437. > > > > > > For more details, see FLIP-525 [1]. Looking forward to your feedback. > > > > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design > > > [2] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL > > > > > > > > > Thanks, > > > Hao > > > > >
Re: [DISCUSS] FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API
Hi all, If there are no more discussions. I will start voting tomorrow. Thanks, Hao On Thu, May 8, 2025 at 10:38 AM Hao Li wrote: > Hi Ron, > > > whether the predict or evaluate introduced in this FLIP can be > serialized to SQL? > > Yes. It can be serialized to SQL. We just need to provide serialization > function in `ModelReferenceExpression` and `ModelSourceQueryOperation`. > > Thanks, > Hao > > > On Thu, May 8, 2025 at 12:09 AM Ron Liu wrote: > > > > Hi, Hao > > > > +1 for the proposal. > > > > Due to [1][2] having supported serialize table API to SQL, so I've one > > question: whether the predict or evaluate introduced in this FLIP can be > > serialized to SQL? > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-393%3A+Make+QueryOperations+SQL+serializable > > [2] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-502%3A+QueryOperation+SQL+Serialization+customization > > > > > > Kindly reminder: The FLIP link [1] is error in your first email. > > > > Best, > > Ron > > > > Yash Anand 于2025年5月5日周一 23:24写道: > > > > > Hi, Hao. > > > > > > +1 for the proposal. > > > > > > Thanks > > > Yas > > > > > > On Mon, May 5, 2025 at 4:17 AM Piotr Nowojski > > > wrote: > > > > > > > Hi, > > > > > > > > sounds like a valuable addition! > > > > > > > > Best, > > > > Piotrek > > > > > > > > wt., 29 kwi 2025 o 03:57 Shengkai Fang > napisał(a): > > > > > > > > > Hi, Hao. > > > > > > > > > > +1 for the proposal. > > > > > > > > > > Best, > > > > > Shengkai > > > > > > > > > > Hao Li 于2025年4月29日周二 07:27写道: > > > > > > > > > > > Hi All, > > > > > > > > > > > > I would like to start a discussion about FLIP-526 [1]: Model > > > > ML_PREDICT, > > > > > > ML_EVALUATE Table API. > > > > > > > > > > > > This FLIP is a follow up of FLIP-507 [2] to propose the table > api for > > > > > model > > > > > > related functions. This FLIP is also closely related to FLIP-525 > [3] > > > > > which > > > > > > is the proposal for model related function implementation design. > > > > > > > > > > > > For more details, see FLIP-526 [1]. Looking forward to your > feedback. > > > > > > > > > > > > Thanks, > > > > > > Hao > > > > > > > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design > > > > > > [2] > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-507%3A+Add+Model+DDL+methods+in+TABLE+API > > > > > > [3] > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design > > > > > > > > > > > > > > > > > > >
Re: [DISCUSS] FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API
Hi Timo, > I don't fully understand why we need BuiltinFunctionQueryOperation and BuiltInCallExpression The main reason is that `CallExpression` and `FunctionQueryOperation` seem to be for `FunctionDefinition` instead of `SqlFunction`. So I want to make new classes just for `SqlFunction`. But of course we can add more information to those existing classes to cater for `SqlFunction` as well. I agree these are implementation details. I'll remove these function definitions and we can discuss them in PR. Thanks, Hao On Mon, May 12, 2025 at 7:58 AM Timo Walther wrote: > One last thing to mention: I don't fully understand why we need > BuiltinFunctionQueryOperation and BuiltInCallExpression. But we can > discuss this in the PR as this is not public API. In any case, we should > make sure to keep the changes to visitors and concepts in Table API > small. In the end, most stuff should be reusable from the PTF work. > > Cheers, > Timo > > > On 12.05.25 16:46, Timo Walther wrote: > > +1 to continue to voting. > > > > Thanks, > > Timo > > > > On 11.05.25 18:47, Hao Li wrote: > >> Hi all, > >> > >> If there are no more discussions. I will start voting tomorrow. > >> > >> Thanks, > >> Hao > >> > >> On Thu, May 8, 2025 at 10:38 AM Hao Li wrote: > >> > >>> Hi Ron, > >>> > >>>> whether the predict or evaluate introduced in this FLIP can be > >>> serialized to SQL? > >>> > >>> Yes. It can be serialized to SQL. We just need to provide serialization > >>> function in `ModelReferenceExpression` and `ModelSourceQueryOperation`. > >>> > >>> Thanks, > >>> Hao > >>> > >>> > >>> On Thu, May 8, 2025 at 12:09 AM Ron Liu wrote: > >>>> > >>>> Hi, Hao > >>>> > >>>> +1 for the proposal. > >>>> > >>>> Due to [1][2] having supported serialize table API to SQL, so I've one > >>>> question: whether the predict or evaluate introduced in this FLIP > >>>> can be > >>>> serialized to SQL? > >>>> > >>>> [1] > >>>> > >>> https://cwiki.apache.org/confluence/display/FLINK/ > >>> FLIP-393%3A+Make+QueryOperations+SQL+serializable > >>>> [2] > >>>> > >>> https://cwiki.apache.org/confluence/display/FLINK/ > >>> FLIP-502%3A+QueryOperation+SQL+Serialization+customization > >>>> > >>>> > >>>> Kindly reminder: The FLIP link [1] is error in your first email. > >>>> > >>>> Best, > >>>> Ron > >>>> > >>>> Yash Anand 于2025年5月5日周一 23:24写道: > >>>> > >>>>> Hi, Hao. > >>>>> > >>>>> +1 for the proposal. > >>>>> > >>>>> Thanks > >>>>> Yas > >>>>> > >>>>> On Mon, May 5, 2025 at 4:17 AM Piotr Nowojski > >>>>> wrote: > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> sounds like a valuable addition! > >>>>>> > >>>>>> Best, > >>>>>> Piotrek > >>>>>> > >>>>>> wt., 29 kwi 2025 o 03:57 Shengkai Fang > >>> napisał(a): > >>>>>> > >>>>>>> Hi, Hao. > >>>>>>> > >>>>>>> +1 for the proposal. > >>>>>>> > >>>>>>> Best, > >>>>>>> Shengkai > >>>>>>> > >>>>>>> Hao Li 于2025年4月29日周二 07:27写道: > >>>>>>> > >>>>>>>> Hi All, > >>>>>>>> > >>>>>>>> I would like to start a discussion about FLIP-526 [1]: Model > >>>>>> ML_PREDICT, > >>>>>>>> ML_EVALUATE Table API. > >>>>>>>> > >>>>>>>> This FLIP is a follow up of FLIP-507 [2] to propose the table > >>> api for > >>>>>>> model > >>>>>>>> related functions. This FLIP is also closely related to FLIP-525 > >>> [3] > >>>>>>> which > >>>>>>>> is the proposal for model related function implementation design. > >>>>>>>> > >>>>>>>> For more details, see FLIP-526 [1]. Looking forward to your > >>> feedback. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Hao > >>>>>>>> > >>>>>>>> > >>>>>>>> [1] > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>> https://cwiki.apache.org/confluence/display/FLINK/ > >>> FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design > >>>>>>>> [2] > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>> https://cwiki.apache.org/confluence/display/FLINK/ > >>> FLIP-507%3A+Add+Model+DDL+methods+in+TABLE+API > >>>>>>>> [3] > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>> https://cwiki.apache.org/confluence/display/FLINK/ > >>> FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>> > >> > > > >
Re: [RESULT][VOTE] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design
Hi Timo, Thanks for letting me know. I'll reopen the vote. Should we update the FLIP wiki [1] to include the 72 business hours? [1] https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals Thanks, Hao On Mon, May 12, 2025 at 5:50 AM Timo Walther wrote: > Hi Hao, > > please note that 72 hours in VOTE threads refers to business days so > excluding weekends. This vote should be open for at least 1 more day. > > Regards, > Timo > > > > On 11.05.25 18:19, Hao Li wrote: > > Hi Dev, > > > > I'm happy to announce that FLIP-525: Model ML_PREDICT, ML_EVALUATE > > Implementation Design [1] has been accepted with 6 approving votes (3 > > binding) [2] > > > > Yash Anand (non-binding) > > Mayank Juneja (non-binding) > > Shengkai Fang (binding) > > Ron Liu (binding) > > Yunfeng Zhou (non-binding) > > Martijn Visser (binding) > > > > Thanks, > > Hao > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design > > [2] https://lists.apache.org/thread/by9mpnmk22nws1sk54z7t69802ws2z55 > > > >
Re: [VOTE] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design
Hi all, I was reminded that 72 hours means business days. So the vote is still open :) Sorry about the confusion. Thanks, Hao On Mon, May 12, 2025 at 1:18 AM Sergey Nuyanzin wrote: > +1 (binding) > > On Mon, May 12, 2025 at 9:55 AM Piotr Nowojski > wrote: > > > > +1 (binding) > > > > Best, > > Piotrek > > > > niedz., 11 maj 2025 o 18:14 Hao Li > napisał(a): > > > > > Hi Dev, > > > > > > Thanks all for voting. I'm closing the vote and the result will be > posted > > > in a separate email. > > > > > > Thanks, > > > Hao > > > > > > On Fri, May 9, 2025 at 12:43 AM Martijn Visser < > martijnvis...@apache.org> > > > wrote: > > > > > > > +1 (binding) > > > > > > > > On Fri, May 9, 2025 at 9:27 AM Yunfeng Zhou < > flink.zhouyunf...@gmail.com > > > > > > > > wrote: > > > > > > > > > +1(non-binding) > > > > > > > > > > Best, > > > > > Yunfeng > > > > > > > > > > > 2025年5月9日 00:01,Hao Li 写道: > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > I'd like to start a vote on FLIP-525: Model ML_PREDICT, > ML_EVALUATE > > > > > > Implementation Design [1], which has been discussed in this > thread > > > [2]. > > > > > > > > > > > > The vote will be open for at least 72 hours unless there is an > > > > objection > > > > > > or not enough votes. > > > > > > > > > > > > Thanks, > > > > > > Hao > > > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design > > > > > > [2] > https://lists.apache.org/thread/880cxnw9ygsjl3x5w0kxrh4x5fw6mp4x > > > > > > > > > > > > > > > > > > > > > -- > Best regards, > Sergey >
Re: [RESULT][VOTE] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design
Hi Dev, I'm happy to announce that FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design [1] has been accepted with 8 approving votes (5 binding) [2]. There are no disapproving votes. Yash Anand (non-binding) Mayank Juneja (non-binding) Shengkai Fang (binding) Ron Liu (binding) Yunfeng Zhou (non-binding) Martijn Visser (binding) Piotr Nowojski (binding) Sergey Nuyanzin (binding) Thanks, Hao [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design [2] https://lists.apache.org/thread/by9mpnmk22nws1sk54z7t69802ws2z55 On Mon, May 12, 2025 at 9:45 AM Hao Li wrote: > Hi Timo, > > Thanks for letting me know. I'll reopen the vote. Should we update the > FLIP wiki [1] to include the 72 business hours? > > [1] > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > > Thanks, > Hao > > On Mon, May 12, 2025 at 5:50 AM Timo Walther wrote: > >> Hi Hao, >> >> please note that 72 hours in VOTE threads refers to business days so >> excluding weekends. This vote should be open for at least 1 more day. >> >> Regards, >> Timo >> >> >> >> On 11.05.25 18:19, Hao Li wrote: >> > Hi Dev, >> > >> > I'm happy to announce that FLIP-525: Model ML_PREDICT, ML_EVALUATE >> > Implementation Design [1] has been accepted with 6 approving votes (3 >> > binding) [2] >> > >> > Yash Anand (non-binding) >> > Mayank Juneja (non-binding) >> > Shengkai Fang (binding) >> > Ron Liu (binding) >> > Yunfeng Zhou (non-binding) >> > Martijn Visser (binding) >> > >> > Thanks, >> > Hao >> > >> > [1] >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design >> > [2] https://lists.apache.org/thread/by9mpnmk22nws1sk54z7t69802ws2z55 >> > >> >>
Re: [VOTE] FLIP-530: Dynamic job configuration
+1 (non-binding) Thanks, Hao On Tue, May 20, 2025 at 12:39 PM Piotr Nowojski wrote: > Hi, > > +1 (binding) > > Best, Piotrek > > wt., 20 maj 2025 o 20:29 Roman Khachatryan napisał(a): > > > Hi everyone, > > > > I'd like to start a vote on FLIP-530: Dynamic job configuration > > [1] which has been discussed in this thread [2]. > > > > The vote will be open for at least 72 hours unless there is an objection > > or not enough votes. > > > > [1] > > https://cwiki.apache.org/confluence/x/uglKFQ > > > > [2] > > https://lists.apache.org/thread/w1m420jx6h5cjv4rfy229xs00mmn7pwg > > > > Regards, > > Roman > > >
Re: [DISCUSS] FLIP-531: Initiate Flink Agents as a new Sub-Project
Hi Xintong, Sean and Chris, Thanks for driving the initiative. Very exciting to bring AI Agent to Flink to empower the streaming use cases. +1 to the FLIP. Thanks, Hao On Wed, May 21, 2025 at 7:35 AM Nishita Pattanayak < nishita.pattana...@gmail.com> wrote: > Hi Sean, Chris and Xintong. This seems to be a very exciting sub-project. > +1 for "flink-agents" sub-project. > > I was going through the FLIP , and had some questions regarding the same: > 1. How would the external model calls (e.g., OpenAI or internal LLMs) > integrated into Flink tasks without introducing backpressure or latency > issues? > In my experience, calling an external LLM has the following > risks: Latency-sensitive (LLM inference can take hundreds of milliseconds > to seconds), Flaky (network issues, rate limits) as well as it > is Non-deterministic (with timeouts, retries, etc.). It would be great to > work/brainstorm on how we solve these issues. > 2. In traditional agent workflows, user feedback often plays a key role in > validating and improving agent outputs. In a continuous, long-running > Flink-based agent system, where interactions might not be user-facing or > synchronous, how do we incorporate human-in-the-loop feedback or > correctness signals to validate and iteratively improve agent behavior? > > This is a really exciting direction for the Flink ecosystem. The idea of > building long-running, context-aware agents natively on Flink feels like a > natural evolution of stream processing. I'd love to see this mature and > would be excited to contribute in any way I can to help productionize and > validate this in real-world use cases. > > On Wed, May 21, 2025 at 8:52 AM Xintong Song > wrote: > > > Hi devs, > > > > Sean, Chris and I would like to start a discussion on FLIP-531 [1], about > > introducing a new sub-project, Flink Agents. > > > > With the rise of agentic AI, we have identified great new opportunities > for > > Flink, particularly in the system-triggered agent scenarios. We believe > the > > future of AI agent applications is industrialized, where agents will not > > only be triggered by users, but increasingly by systems as well. Flink's > > event capabilities in real-time distributed event processing, state > > management and exact-once consistency fault tolerance make it well-suited > > as a framework for building such system-triggered agents. Furthermore, > > system-triggered agents are often tightly coupled with data processing. > > Flink's outstanding data processing capabilities allows seamless > > integration between data and agentic processing. These capabilities > > differentiate Flink from other agent frameworks with unique advantages in > > the context of system-triggered agents. > > > > We propose this effort as a sub-project of Apache Flink, with a separate > > code repository and lightweight developing process, for rapid iteration > > during the early stage. > > > > Please note that this FLIP is focused on the high-level plans, including > > motivation, positioning, goals, roadmap, and operating model of the > > project. Detailed technical design is out of the scope and will be > > discussed during the rapid prototyping and iterations. > > > > For more details, please check the FLIP [1]. Looking forward to your > > feedback. > > > > Best, > > > > Xintong > > > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-531%3A+Initiate+Flink+Agents+as+a+new+Sub-Peoject > > >
Re: [VOTE] FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API
Hi Dev, Thanks all for voting. I'm closing the vote and the result will be posted in a separate email. Thanks, Hao On Fri, May 30, 2025 at 12:01 AM Robert Metzger wrote: > +1 (binding) > > On Fri, May 30, 2025 at 8:55 AM Piotr Nowojski > wrote: > > > +1 (binding) > > > > Best, > > Piotrek > > > > czw., 29 maj 2025 o 05:58 Shengkai Fang napisał(a): > > > > > +1(binding) > > > > > > Best, > > > Shengkai > > > > > > Timo Walther 于2025年5月29日周四 00:00写道: > > > > > > > +1 (binding) > > > > > > > > We should still check whether changes to QueryOperationVisitor are > > > > necessary but this is internal API and should not the FLIP. The > public > > > > API looks correct to me. > > > > > > > > Cheers, > > > > Timo > > > > > > > > > > > > On 12.05.25 18:50, Hao Li wrote: > > > > > Hi everyone, > > > > > > > > > > I'd like to start a vote on FLIP-526: Model ML_PREDICT, ML_EVALUATE > > > Table > > > > > API [1], which has been discussed in this thread [2]. > > > > > > > > > > The vote will be open for at least 72 business hours unless there > is > > an > > > > > objection > > > > > or not enough votes. > > > > > > > > > > Thanks, > > > > > Hao > > > > > > > > > > [1] > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-526%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Table+API > > > > > [2] > https://lists.apache.org/thread/hlmm9l6qr3pdhs8oc8ltn4pf0s6tg3x4 > > > > > > > > > > > > > > > > > > >
[RESULT][VOTE] FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API
Hi Dev, I'm happy to announce that FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API [1] has been accepted with 4 approving votes (4 binding) [2]. Timo Walther (binding) Shengkai Fang (binding) Piotr Nowojski (binding) Robert Metzger (binding) Thanks, Hao [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-526%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Table+API [2] https://lists.apache.org/thread/5z18ff83oljdvfl86h0w5ovnvy5h2h51
Re: [VOTE] FLIP-531: Initiate Flink Agents as a new Sub-Project
+1 (non-binding) Thanks, Hao On Mon, Jun 2, 2025 at 11:50 AM Jiaming Xu wrote: > +1 (non-binding) > > > On Jun 2, 2025, at 11:03, Mingge Deng > wrote: > > > > +1 > > > > On Mon, Jun 2, 2025 at 7:53 AM Yuan Mei wrote: > > > >> +1 (binding) > >> > >> On Mon, Jun 2, 2025 at 5:07 PM Martijn Visser > > >> wrote: > >> > >>> +1 (binding) > >>> > >>> On Mon, Jun 2, 2025 at 9:07 AM Robert Metzger > >> wrote: > >>> > +1 (binding) > > > On Mon, Jun 2, 2025 at 8:38 AM Jing Ge > >>> wrote: > > > +1(binding) > > > > Best regards, > > Jing > > > > On Mon, Jun 2, 2025 at 3:31 AM Xintong Song > wrote: > > > >> Hi everyone, > >> > >> I'd like to start a vote on FLIP-531: Initiate Flink Agents as a > >> new > >> Sub-Project [1], which has been discussed in this thread [2]. > >> > >> The vote will be open for at least 72 hours. > >> > >> > >> Best, > >> > >> Xintong > >> > >> > >> [1] > >> > >> > > > > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-531%3A+Initiate+Flink+Agents+as+a+new+Sub-Project > >> > >> [2] > >> https://lists.apache.org/thread/p9m7wvtlw7sc2oh7v0jk3hb60x5x53yy > >> > > > > >>> > >> > >
Re: [DISCUSS] FLIP-440: Annotation naming for PTFs
+1 for option A. It's consistent with calcite naming. On Thu, Jun 19, 2025 at 11:38 AM Sergey Nuyanzin wrote: > Thanks for raising this > +1 for option A > > On Thu, Jun 19, 2025 at 4:05 PM Gustavo de Morais > wrote: > > > > Hi Timo, > > > > +1 (non-binding) for option A. Thanks for trying to address feedback > > quickly. > > > > Kind regards, > > Gustavo de Morais > > > > On Thu, 19 Jun 2025 at 15:51, Timo Walther wrote: > > > > > Hi everyone, > > > > > > I'm currently polishing FLIP-440, I would like to apply some last > minute > > > changes before the first release of PTFs for Flink 2.1. I've already > > > collected initial user feedback and it seems that the name for > > > annotations of table arguments is not precise enough. > > > > > > As always, naming is a hard problem in software engineering. > > > > > > For background, please take a look at this docs section: > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/ptfs/#table-semantics-and-virtual-processors > > > > > > Currently, a PTF signature can look like when taking a table as an > > > argument: > > > > > > // Untyped with set semantics > > > eval(@ArgumentHint(TABLE_AS_SET) Row order); > > > > > > // Typed with set semantics > > > eval(@ArgumentHint(TABLE_AS_SET) Order order); > > > > > > // Untyped with row semantics > > > eval(@ArgumentHint(TABLE_AS_ROW) Row order); > > > > > > // Typed with row semantics > > > eval(@ArgumentHint(TABLE_AS_ROW) Order order); > > > > > > The annotation value confuses people, so I would ask for renaming this > > > part of the API. > > > > > > Option A: > > > ROW_SEMANTIC_TABLE > > > SET_SEMANTIC_TABLE > > > > > > Option B: > > > ROW_WISE_TABLE > > > SET_WISE_TABLE > > > > > > Option C: > > > ROW_SCOPED_TABLE > > > SET_SCOPED_TABLE > > > > > > Option D: > > > KEYED_TABLE > > > UNKEYED_TABLE > > > > > > Option E: > > > PARTITIONED_TABLE > > > ROW_WISE_TABLE > > > > > > Option A/B/C are closer to SQL standard and not too far away from > > > current docs. Option D is closer to Flink DataStream API but could be > > > confusing if no PARTITION BY clause is given but still the table could > > > be keyed. Option E neither takes SQL standard nor DataStream API as a > > > reference. > > > > > > Personally, I would vote for Option A. > > > > > > Looking forward to your opinion. > > > > > > Cheers, > > > Timo > > > > > > > > > > > > > -- > Best regards, > Sergey >
Re: [VOTE] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design
Hi all, As part of the discussion in AsyncTableFunction PR [1], it's suggested to change the config name from `table.exec.async-table.buffer-capacity` to `table.exec.async-table.max-concurrent-operations`. I think the change makes sense and to make the model async config name consistent with async table, I'm proposing to change the proposed config in this FLIP-525 from `table.exec.async-ml-predict.buffer-capacity` to `table.exec.async-ml-predict.max-concurrent-operations`. What do people think? If there's no objection, I'll go ahead and make the change in FLIP and code since the feature isn't released yet. Thanks, Hao [1] https://github.com/apache/flink/pull/26567/files#r2144129856 On Mon, May 12, 2025 at 9:47 AM Hao Li wrote: > Hi all, > > I was reminded that 72 hours means business days. So the vote is still > open :) Sorry about the confusion. > > Thanks, > Hao > > On Mon, May 12, 2025 at 1:18 AM Sergey Nuyanzin > wrote: > >> +1 (binding) >> >> On Mon, May 12, 2025 at 9:55 AM Piotr Nowojski >> wrote: >> > >> > +1 (binding) >> > >> > Best, >> > Piotrek >> > >> > niedz., 11 maj 2025 o 18:14 Hao Li >> napisał(a): >> > >> > > Hi Dev, >> > > >> > > Thanks all for voting. I'm closing the vote and the result will be >> posted >> > > in a separate email. >> > > >> > > Thanks, >> > > Hao >> > > >> > > On Fri, May 9, 2025 at 12:43 AM Martijn Visser < >> martijnvis...@apache.org> >> > > wrote: >> > > >> > > > +1 (binding) >> > > > >> > > > On Fri, May 9, 2025 at 9:27 AM Yunfeng Zhou < >> flink.zhouyunf...@gmail.com >> > > > >> > > > wrote: >> > > > >> > > > > +1(non-binding) >> > > > > >> > > > > Best, >> > > > > Yunfeng >> > > > > >> > > > > > 2025年5月9日 00:01,Hao Li 写道: >> > > > > > >> > > > > > Hi everyone, >> > > > > > >> > > > > > I'd like to start a vote on FLIP-525: Model ML_PREDICT, >> ML_EVALUATE >> > > > > > Implementation Design [1], which has been discussed in this >> thread >> > > [2]. >> > > > > > >> > > > > > The vote will be open for at least 72 hours unless there is an >> > > > objection >> > > > > > or not enough votes. >> > > > > > >> > > > > > Thanks, >> > > > > > Hao >> > > > > > >> > > > > > [1] >> > > > > > >> > > > > >> > > > >> > > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design >> > > > > > [2] >> https://lists.apache.org/thread/880cxnw9ygsjl3x5w0kxrh4x5fw6mp4x >> > > > > >> > > > > >> > > > >> > > >> >> >> >> -- >> Best regards, >> Sergey >> >
Re: [ANNOUNCE] Flink 2.1 feature freeze
Hi Ron, For FLIP-525, as discussed in [1], there's a config name change. I have a PR [2] for it which needs to be merged into 2.1 release. Thanks, Hao [1] https://lists.apache.org/thread/t979kpqs8v6zg5t8bgql3wwf27t7th9b [2] https://github.com/apache/flink/pull/26706 On Fri, Jun 20, 2025 at 7:56 PM Ron Liu wrote: > Hi, dev > > The feature freeze of 2.1 has started now. That means that no new features > or improvements should now be merged into the master branch unless you ask > the release managers first, which has already been done for PRs, or pending > on CI to pass. Bug fixes and documentation PRs can still be merged. > > > - *Cutting release branch* > > Currently we have no blocker issues(beside tickets that used for > release-testing). > > We are planning to cut the release branch on next Wednesday (June 25) if > no new test instabilities, and we'll make another announcement in the > dev mailing list then. > > > - *Cross-team testing* > > The release testing will start right after we cut the release branch, which > is expected to come in the next week. As a prerequisite of it, please > complete the documentation of your new feature and mark the related JIRA > issue in the 2.1 release wiki page [1] before we start testing, which > would be quite helpful for other developers to validate your features. > > > [1] https://cwiki.apache.org/confluence/display/FLINK/2.1+Release > > Best, > Ron >
Re: [SUMMARY] Flink 2.1 Release Sync 06/04/2025
Hi Ron, Thanks for driving this. Can you also add FLIP-507 [1] to 2.1 release? There's one PR for this FLIP and it can be finished before the freeze. Thanks, Hao [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-507%3A+Add+Model+DDL+methods+in+TABLE+API On Wed, Jun 4, 2025 at 12:44 AM Ron Liu wrote: > Hi, devs, > > This is the fourth meeting for Flink 2.1 release cycle. I'd like to share > the information synced in the meeting. > > > - Feature Freeze > > It is worth noting that there are only 2 weeks left until the > feature freeze time(June 21, 2025), and developers need to pay attention to > the feature freeze time. > > > - Features: > > So far we've had 16 FLIPs/features, and five FLIPs have been completed in > 2.1. > Regarding the remaining 11 FLIPs, 8 FLIPs are confirmed to be completed > in 2.1, > and the status of the other three is uncertain. > > The progress of two very important FLIP-486 & FLIP-521 is still zero, > there are some concerns the effort could be ready for the feature freeze > of 2.1. I will confirm it with the corresponding contributor. > > - CI: > > So far, we have four unstable cases and none blocker issues. > all of which have created jira issues: >1. https://issues.apache.org/jira/browse/FLINK-37703: This was an > occasional unstable case caused by a test-dependent HDFS, and we found a > community contributor to locate it >2. https://issues.apache.org/jira/browse/FLINK-37822: Piotr will fix > it. >3. https://issues.apache.org/jira/browse/FLINK-37701: We found Piotr to > fix it. >4. https://issues.apache.org/jira/browse/FLINK-37842: This is a problem > with the test case itself. No contributor has been found yet. We are still > monitoring it. > >In addition, we have recently frequently encountered 502 & 503 Proxy > Errors in Maven deploy. The response given by ASF infra is hardware problem > and is in the process of migrating data to a replacement server. For > details, see: https://issues.apache.org/jira/browse/INFRA-26874 > > - Benchmark > Since May 19, some serde-related performance regressions have occurred, > see the issue for details. We need a volunteer to locate the root cause by > reproducing it locally. Since it only occurs on JDK11, Zakelly believes it > may not be a problem, so it is not a blocker now. > > 1. https://issues.apache.org/jira/browse/FLINK-37826 > > - Sync meeting[3]: > > As code freeze is approaching, release-sync is adjusted from two weeks to > once a week, so the next meeting time is 06/011/2025, Please feel free to > join us[2]! > > > Lastly, we encourage attendees to fill out the topics to be discussed at > the bottom of 2.1 wiki page[1] a day in advance, to make it easier for > everyone to understand the background of the topics, thanks! > > [1] https://cwiki.apache.org/confluence/display/FLINK/2.1+Release > [2] https://meet.google.com/pog-pebb-zrj >
Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API
Hi Shengkai, Leonard, Thanks for the feedback. For Shengkai's feedback: > 1. Why SecretStoreFactory#open throws a CatalogException? I think the exteranl system can not handle this exception. You are right, we can make it throw `Exception` > 2. I think we can also modify the create catalog ddl syntax. ``` CREATE CATALOG cat USING CONNECTION mycat.mydb.mysql_prod WITH ( 'type' = 'jdbc' ); ``` Does `mycat.mydb.mysql_prod` exist in another catalog? This feels like a chicken-egg problem. I think for connections to be used for catalog creation, it needs to be system connection similar to system function which exist outside of CatalogManager. Maybe we can defer this to later functionality? > 3. It seems the connector factory should merge the with options and connection options together and then create the source/sink. It's better that framework can merge all these options and connectors don't need any codes. I think it's better to separate connections with table options and make it explicit. Reasons is: the connection here should be a decrypted one. It's sensitive and should be handled carefully regarding logging, usage etc. Mixing with original table options makes it hard to do. But the Flip used `CatalogConnection` which is an encrypted one. I'll update it to `SensitveConnection`. > 4. Why we need to introduce ConnectionFactory? I think connection is like CatalogTable. It should hold the basic information and all information in the connection should be stored into secret store. The main reason is to enable user defined connection handling for different types. For example, if connection type is `basic`, the corresponding factory can handle basic type secrets (e.g. extract username/password from connection options and do encryption). For Leonard's feedback: > (1) Minor: Interface Hierarchy : Why doesn't WritableSecretStore extend SecretStore? Good catch. Let me update it. > (2) Configurability of SECRET_FIELDS : Could the hardcoded SECRET_FIELDS in BasicConnectionFactory be made configurable (e.g., 'token' vs 'accessKey') for better connector compatibility? This depends on `ConnectionFactory` implementation and can be self defined by user. > (3)Inconsistent Return Types : ConnectionFactory#resolveConnection returns SensitiveConnection, while BasicConnectionFactory#resolveConnection returns Map. Should these be aligned? Good catch. Let me update the FLIP. > (4)Framework-level Resolution : +1 to Shengkai's point about having the framework (DynamicTableFactory) return complete options to reduce connector adaptation cost. Please see my explanation for Shengkai's similar question. > (5)Secret ID Handling : When no encryption is needed, secretId is null (from secrets.isEmpty() ? null : secretStore.storeSecret(secrets)). This behavior should be explicitly documented in the interfaces. It's an example in the FLIP about how `ConnectionFactory` could be implemented. How encryption is done depends on actual implementation though. Let me update the example to make it more clear. Thanks, Hao On Thu, Jul 24, 2025 at 5:04 AM Leonard Xu wrote: > Hi friends > > I like the updated FLIP goals, that’s what I want. I’ve some feedback: > > (1) Minor: Interface Hierarchy : Why doesn't WritableSecretStore extend > SecretStore? > (2) Configurability of SECRET_FIELDS : Could the hardcoded SECRET_FIELDS > in BasicConnectionFactory be made configurable (e.g., 'token' vs > 'accessKey') for better connector compatibility? > (3)Inconsistent Return Types : ConnectionFactory#resolveConnection returns > SensitiveConnection, while BasicConnectionFactory#resolveConnection returns > Map. Should these be aligned? > (4)Framework-level Resolution : +1 to Shengkai's point about having the > framework (DynamicTableFactory) return complete options to reduce connector > adaptation cost. > (5)Secret ID Handling : When no encryption is needed, secretId is null > (from secrets.isEmpty() ? null : secretStore.storeSecret(secrets)). This > behavior should be explicitly documented in the interfaces. > > Best, > Leonard > > > 2025 7月 24 11:44,Shengkai Fang 写道: > > > > hi. > > > > Sorry for the late reply. I just have some questions: > > > > 1. Why SecretStoreFactory#open throws a CatalogException? I think the > > exteranl system can not handle this exception. > > > > 2. I think we can also modify the create catalog ddl syntax. > > > > ``` > > CREATE CATALOG cat USING CONNECTION mycat.mydb.mysql_prod > > WITH ( > >'type' = 'jdbc' > > ); > > ``` > > > > 3. It seems the connector factory should merge the with options and > > connection options together and then create the source/sink. It's > > better that framework can merge all these options and connectors don't > need > > any codes. > > > > 4. Why we need to introduce ConnectionFactory? I think connection is like > > CatalogTable. It should hold the basic information and all information in > > the connection should be stored into secret store. > > > > > > Best, > > Shengkai > > > >
Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API
Hi Yanquan, Thanks for the feedback. > (1) I agree with the design of SecretStore. I hope to add the results of the 'SHOW Connections' query to the document, 'secret.id' should be included in the results, This helps us establish the association between tables and connections. I think we can output "secret.id" field in `describe connection` query. `show connections` can show the names of connections similar to other show command such as `show tables` or `show models`? > Can you clearly point out this point. It seems that we cannot create a CatalogConnection under any catalog. For example, we cannot save CatalogConnection in the JDBC catalog. I didn't mean we can't create connection under catalog. I meant we can't create catalog using connection if connection exists in catalog. We need to introduce some System Connection later to be used to create catalog. If you already have a JDBC catalog, we can still save CatalogConnection in it. Thanks, Hao On Sun, Jul 27, 2025 at 7:24 PM Yanquan Lv wrote: > Hi Mayank and Hao, > Thanks for the updates and comments. Here are some of my opinions: > > (1) I agree with the design of SecretStore. I hope to add the results of > the 'SHOW Connections' query to the document, 'secret.id' should be > included in the results, This helps us establish the association between > tables and connections. > (2) > > I think for connections to be used for catalog > > creation, it needs to be system connection similar to system function > which > > exist outside of CatalogManager. > Can you clearly point out this point. It seems that we cannot create a > CatalogConnection under any catalog. For example, we cannot save > CatalogConnection in the JDBC catalog. > > Hao Li 于2025年7月25日周五 06:54写道: > > > Hi Shengkai, Leonard, > > > > Thanks for the feedback. > > > > For Shengkai's feedback: > > > > > 1. Why SecretStoreFactory#open throws a CatalogException? I think the > > exteranl system can not handle this exception. > > You are right, we can make it throw `Exception` > > > > > 2. I think we can also modify the create catalog ddl syntax. > > > > ``` > > CREATE CATALOG cat USING CONNECTION mycat.mydb.mysql_prod > > WITH ( > > 'type' = 'jdbc' > > ); > > ``` > > > > Does `mycat.mydb.mysql_prod` exist in another catalog? This feels like a > > chicken-egg problem. I think for connections to be used for catalog > > creation, it needs to be system connection similar to system function > which > > exist outside of CatalogManager. Maybe we can defer this to later > > functionality? > > > > > 3. It seems the connector factory should merge the with options and > > connection options together and then create the source/sink. It's > > better that framework can merge all these options and connectors don't > need > > any codes. > > > > I think it's better to separate connections with table options and make > it > > explicit. Reasons is: the connection here should be a decrypted one. It's > > sensitive and should be handled carefully regarding logging, usage etc. > > Mixing with original table options makes it hard to do. But the Flip used > > `CatalogConnection` which is an encrypted one. I'll update it to > > `SensitveConnection`. > > > > > 4. Why we need to introduce ConnectionFactory? I think connection is > like > > CatalogTable. It should hold the basic information and all information in > > the connection should be stored into secret store. > > > > The main reason is to enable user defined connection handling for > different > > types. For example, if connection type is `basic`, the corresponding > > factory can handle basic type secrets (e.g. extract username/password > from > > connection options and do encryption). > > > > For Leonard's feedback: > > > (1) Minor: Interface Hierarchy : Why doesn't WritableSecretStore extend > > SecretStore? > > > > Good catch. Let me update it. > > > > > (2) Configurability of SECRET_FIELDS : Could the hardcoded > SECRET_FIELDS > > in BasicConnectionFactory be made configurable (e.g., 'token' vs > > 'accessKey') for better connector compatibility? > > > > This depends on `ConnectionFactory` implementation and can be self > defined > > by user. > > > > > (3)Inconsistent Return Types : ConnectionFactory#resolveConnection > > returns SensitiveConnection, while > BasicConnectionFactory#resolveConnection > > returns Map. Should these be
Re: [VOTE] FLIP-400: AsyncScalarFunction for asynchronous scalar function support
+1 to change the config name from buffer-capacity to max-concurrent-operations. Thanks, Hao On Thu, Jul 24, 2025 at 2:26 PM Alan Sheinberg wrote: > Hi, > > New functionality for AsyncTableFunction ( > https://github.com/apache/flink/pull/26567) was recently added, which is > essentially the same async behavior, but for table functions. It contains > similar configurations as for AsyncScalarFunction. While reviewing, we > decided to change the config name of buffer-capacity to be > max-concurrent-operations, since it's more intuitive. I am going to make a > similar change to AsyncScalarFunction, including backwards support for the > existing config name using withDeprecatedKeys. If there are any issues, > please let me know. > > Thanks, > Alan > > On Wed, Jan 3, 2024 at 11:29 AM Alan Sheinberg > wrote: > > > Thank you everyone for participating in the vote. I'm closing this vote > > now and will announce the results in a separate thread. > > > > -Alan > > > > On Tue, Jan 2, 2024 at 8:40 AM Piotr Nowojski > > wrote: > > > >> +1 (binding) > >> > >> Best, > >> Piotrek > >> > >> czw., 28 gru 2023 o 09:19 Timo Walther napisał(a): > >> > >> > +1 (binding) > >> > > >> > Cheers, > >> > Timo > >> > > >> > > Am 28.12.2023 um 03:13 schrieb Yuepeng Pan : > >> > > > >> > > +1 (non-binding). > >> > > > >> > > Best, > >> > > Yuepeng Pan. > >> > > > >> > > > >> > > > >> > > > >> > > At 2023-12-28 09:19:37, "Lincoln Lee" > wrote: > >> > >> +1 (binding) > >> > >> > >> > >> Best, > >> > >> Lincoln Lee > >> > >> > >> > >> > >> > >> Martijn Visser 于2023年12月27日周三 23:16写道: > >> > >> > >> > >>> +1 (binding) > >> > >>> > >> > >>> On Fri, Dec 22, 2023 at 1:44 AM Jim Hughes > >> > > >> > >>> wrote: > >> > > >> > Hi Alan, > >> > > >> > +1 (non binding) > >> > > >> > Cheers, > >> > > >> > Jim > >> > > >> > On Wed, Dec 20, 2023 at 2:41 PM Alan Sheinberg > >> > wrote: > >> > > >> > > Hi everyone, > >> > > > >> > > I'd like to start a vote on FLIP-400 [1]. It covers introducing > a > >> new > >> > >>> UDF > >> > > type, AsyncScalarFunction for completing invocations > >> asynchronously. > >> > >>> It > >> > > has been discussed in this thread [2]. > >> > > > >> > > I would like to start a vote. The vote will be open for at > least > >> 72 > >> > >>> hours > >> > > (until December 28th 18:00 GMT) unless there is an objection or > >> > > insufficient votes. > >> > > > >> > > [1] > >> > > > >> > > > >> > >>> > >> > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-400%3A+AsyncScalarFunction+for+asynchronous+scalar+function+support > >> > > [2] > >> https://lists.apache.org/thread/q3st6t1w05grd7bthzfjtr4r54fv4tm2 > >> > > > >> > > Thanks, > >> > > Alan > >> > > > >> > >>> > >> > > >> > > >> > > >
Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API
Hi Leonard, Thanks for the feedback and offline sync yesterday. > I think connection is a very common need for most catalogs like MySQLCatalog、KafkaCatalog、HiveCatalog and so on, all of these catalogs need a connection. I added `TEMPORARY SYSTEM` connection so it's a global level connection which can be used for Catalog creation. After syncing with Timo, we propose to store it first in memory like `TEMPORARY SYSTEM FUNCTION` since this FLIP is already introducing lots of concepts and interfaces. We can provide `SYSTEM` connection and related interfaces to persist it in following up FLIPs. > In this case, I think reducing connector development cost is more important than making is explicit, the connector knows which options is sensitive or not. Sure. I updated the FLIP to merge connection options with table options so it's easier for current connectors. > I hope the BasicConnectionFactory can be a common one that can feed most common users case, otherwise encrypt all options is a good idea. I updated to `DefaultConnectionFactory` which handles most of the secret keys. Thanks, Hao On Mon, Jul 28, 2025 at 6:13 AM Leonard Xu wrote: > Hey, Hao > > Please see my comments as follows: > > >> 2. I think we can also modify the create catalog ddl syntax. > > > > ``` > > CREATE CATALOG cat USING CONNECTION mycat.mydb.mysql_prod > > WITH ( > >'type' = 'jdbc' > > ); > > ``` > > > > Does `mycat.mydb.mysql_prod` exist in another catalog? This feels like a > > chicken-egg problem. I think for connections to be used for catalog > > creation, it needs to be system connection similar to system function > which > > exist outside of CatalogManager. Maybe we can defer this to later > > functionality? > > I think connection is a very common need for most catalogs like > MySQLCatalog、KafkaCatalog、HiveCatalog and so on, all of these catalogs need > a connection. > > > >> 3. It seems the connector factory should merge the with options and > > connection options together and then create the source/sink. It's > > better that framework can merge all these options and connectors don't > need > > any codes. > > > > I think it's better to separate connections with table options and make > it > > explicit. Reasons is: the connection here should be a decrypted one. It's > > sensitive and should be handled carefully regarding logging, usage etc. > > Mixing with original table options makes it hard to do. But the Flip used > > `CatalogConnection` which is an encrypted one. I'll update it to > > `SensitveConnection`. > >> (4)Framework-level Resolution : +1 to Shengkai's point about having the > > framework (DynamicTableFactory) return complete options to reduce > connector > > adaptation cost. > > > > Please see my explanation for Shengkai's similar question. > > In this case, I think reducing connector development cost is more > important than making is explicit, the connector knows which options is > sensitive or not. > > > >> 4. Why we need to introduce ConnectionFactory? I think connection is > like > > CatalogTable. It should hold the basic information and all information in > > the connection should be stored into secret store. > > > > The main reason is to enable user defined connection handling for > different > > types. For example, if connection type is `basic`, the corresponding > > factory can handle basic type secrets (e.g. extract username/password > from > > connection options and do encryption). > > > >> (2) Configurability of SECRET_FIELDS : Could the hardcoded SECRET_FIELDS > > in BasicConnectionFactory be made configurable (e.g., 'token' vs > > 'accessKey') for better connector compatibility? > > > > This depends on `ConnectionFactory` implementation and can be self > defined > > by user. > > I hope the BasicConnectionFactory can be a common one that can feed most > common users case, otherwise encrypt all options is a good idea. > > Btw, I also want to push the FLIP forward and start a vote ASAP, thus a > meeting is welcome if you think it can help finalizing the discussion > thread. > > > Best, > Leonard > > > > > > >> Hi friends > >> > >> I like the updated FLIP goals, that’s what I want. I’ve some feedback: > >> > >> (1) Minor: Interface Hierarchy : Why doesn't WritableSecretStore extend > >> SecretStore? > >> (2) Configurability of SECRET_FIELDS : Could the hardcoded SECRET_FIELDS > >> in BasicConnectionFactory be made configurable (e.g., 'token' vs > >> 'accessKey') for better connector compatibility? > >> (3)Inconsistent Return Types : ConnectionFactory#resolveConnection > returns > >> SensitiveConnection, while BasicConnectionFactory#resolveConnection > returns > >> Map. Should these be aligned? > >> (4)Framework-level Resolution : +1 to Shengkai's point about having the > >> framework (DynamicTableFactory) return complete options to reduce > connector > >> adaptation cost. > >> (5)Secret ID Handling : When no encryption is needed, secretId is null > >> (from secrets.isEmpty() ? null : secretStore.storeSe
Re: [DISCUSS] FLIP-540: Support VECTOR_SEARCH in Flink SQL
Hi Shengkai, Thanks for the FLIP and enhancement for AI capabilities in Flink. +1. Thanks, Hao On Tue, Jul 29, 2025 at 2:16 AM Shengkai Fang wrote: > Hi, > I'd like to start a discussion of FLIP-540: Support VECTOR_SEARCH in Flink > SQL[1]. > > In FLIP-437/FLIP-525, Apache Flink has initially integrated Large Language > Model (LLM) capabilities, enabling semantic understanding and real-time > processing of streaming data pipelines. This integration has been > technically validated in scenarios such as log classification and real-time > question-answering systems. However, the current architecture allows Flink > to only use embedding models to convert unstructured data (e.g., text, > images) into high-dimensional vector features, which are then persisted to > downstream storage systems (e.g., Milvus, Mongodb). It lacks real-time > online querying and similarity analysis capabilities for vector spaces. To > address this limitation, we propose introducing the VECTOR_SEARCH function > in this FLIP, enabling users to perform streaming vector similarity > searches and real-time context retrieval (e.g., Retrieval-Augmented > Generation, RAG) directly within Flink. > > Looking forward to comments and suggestions for improvements! > > Best, > Shengkai > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-540%3A+Support+VECTOR_SEARCH+in+Flink+SQL >
Re: [VOTE] FLIP-539: Support WITH Clause in CREATE FUNCTION Statement in Flink SQL
+1 (non-binding) Thanks, Hao On Fri, Aug 1, 2025 at 5:33 PM Sergey Nuyanzin wrote: > +1 (binding) > > On Fri, Aug 1, 2025 at 5:18 PM Ramin Gharib wrote: > > > > Hi David, > > > > +1 (non-binding) > > > > Thanks for the FLIP and the effort! > > Best, > > > > Ramin > > > > On Fri, Aug 1, 2025 at 4:18 PM Dawid Wysakowicz > > wrote: > > > > > Hi everyone, > > > > > > I'd like to start a vote on FLIP-539: Support WITH Clause in CREATE > > > FUNCTION Statement in Flink SQL [1], which has been discussed in this > > > thread [2]. > > > > > > The vote will be open for at least 72 hours unless there is an > objection > > > or not enough votes. > > > > > > Thanks, > > > Dawid > > > > > > [1] https://cwiki.apache.org/confluence/x/sg9JFg > > > [2] https://lists.apache.org/thread/2mmc454n2ol2cxdhbh7mz77w62vhstd4 > > > > > > > -- > Best regards, > Sergey >
Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API
Hi Timo, Thanks for the suggestion. I updated the FLIP with `ObjectIdentifier` taking only objectName and made it clear that the system connection uses only a simple identifier instead of compound identifier. Thanks, Hao On Thu, Jul 31, 2025 at 6:12 AM Timo Walther wrote: > Thanks for updating the FLIP and starting a VOTE thread. I have two last > minute questions before I cast my vote: > > It seems the definition of TEMPORARY SYSTEM CONNECTION is not precise > enough. In my point of view, system connections are single part > identifiers (similar to functions). And catalog connections are 3-part > identifiers. > > So will we support the following DDL: > > CREATE TEMPORARY SYSTEM CONNECTION global_connection WITH (...); > > CREATE TABLE t (...) USING CONNECTION global_connection > > I think we should. But then we need to update the > `CatalogTable.getConnection(): Optional` interface, > because ObjectIdentifier currently only supports 3 part identifiers. > > I suggest we modify ObjectIdentifier and let it have either 3 or 1 part. > Then we can also remove the need for FunctionIdentifier which exists for > exactly this purpose. > > What do you think? > > Cheers, > Timo > > > On 30.07.25 05:36, Leonard Xu wrote: > > Thanks Hao for the effort to push the FLIP forward, I believe current > design would satisfy most user cases. > > > > +1 to start a vote. > > > > Best, > > Leonard > > > >> 2025 7月 30 02:27,Hao Li 写道: > >> > >> Sorry, I forgot to mention I also updated the FLIP to include table apis > >> for connection. It was originally in examples but not in the public api > >> section. > >> > >> On Tue, Jul 29, 2025 at 10:12 AM Hao Li wrote: > >> > >>> Hi Leonard, > >>> > >>> Thanks for the feedback and offline sync yesterday. > >>> > >>>> I think connection is a very common need for most catalogs like > >>> MySQLCatalog、KafkaCatalog、HiveCatalog and so on, all of these catalogs > need > >>> a connection. > >>> I added `TEMPORARY SYSTEM` connection so it's a global level connection > >>> which can be used for Catalog creation. After syncing with Timo, we > propose > >>> to store it first in memory like `TEMPORARY SYSTEM FUNCTION` since this > >>> FLIP is already introducing lots of concepts and interfaces. We can > provide > >>> `SYSTEM` connection and related interfaces to persist it in following > up > >>> FLIPs. > >>> > >>>> In this case, I think reducing connector development cost is more > >>> important than making is explicit, the connector knows which options is > >>> sensitive or not. > >>> Sure. I updated the FLIP to merge connection options with table > options so > >>> it's easier for current connectors. > >>> > >>>> I hope the BasicConnectionFactory can be a common one that can feed > most > >>> common users case, otherwise encrypt all options is a good idea. > >>> I updated to `DefaultConnectionFactory` which handles most of the > secret > >>> keys. > >>> > >>> Thanks, > >>> Hao > >>> > >>> On Mon, Jul 28, 2025 at 6:13 AM Leonard Xu wrote: > >>> > >>>> Hey, Hao > >>>> > >>>> Please see my comments as follows: > >>>> > >>>>>> 2. I think we can also modify the create catalog ddl syntax. > >>>>> > >>>>> ``` > >>>>> CREATE CATALOG cat USING CONNECTION mycat.mydb.mysql_prod > >>>>> WITH ( > >>>>>'type' = 'jdbc' > >>>>> ); > >>>>> ``` > >>>>> > >>>>> Does `mycat.mydb.mysql_prod` exist in another catalog? This feels > like a > >>>>> chicken-egg problem. I think for connections to be used for catalog > >>>>> creation, it needs to be system connection similar to system function > >>>> which > >>>>> exist outside of CatalogManager. Maybe we can defer this to later > >>>>> functionality? > >>>> > >>>> I think connection is a very common need for most catalogs like > >>>> MySQLCatalog、KafkaCatalog、HiveCatalog and so on, all of these > catalogs need > >>>> a connection. > >>>> > >>>> > >>>>>> 3. It seems the connector factory should merge the with
Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API
Sorry, I forgot to mention I also updated the FLIP to include table apis for connection. It was originally in examples but not in the public api section. On Tue, Jul 29, 2025 at 10:12 AM Hao Li wrote: > Hi Leonard, > > Thanks for the feedback and offline sync yesterday. > > > I think connection is a very common need for most catalogs like > MySQLCatalog、KafkaCatalog、HiveCatalog and so on, all of these catalogs need > a connection. > I added `TEMPORARY SYSTEM` connection so it's a global level connection > which can be used for Catalog creation. After syncing with Timo, we propose > to store it first in memory like `TEMPORARY SYSTEM FUNCTION` since this > FLIP is already introducing lots of concepts and interfaces. We can provide > `SYSTEM` connection and related interfaces to persist it in following up > FLIPs. > > > In this case, I think reducing connector development cost is more > important than making is explicit, the connector knows which options is > sensitive or not. > Sure. I updated the FLIP to merge connection options with table options so > it's easier for current connectors. > > > I hope the BasicConnectionFactory can be a common one that can feed most > common users case, otherwise encrypt all options is a good idea. > I updated to `DefaultConnectionFactory` which handles most of the secret > keys. > > Thanks, > Hao > > On Mon, Jul 28, 2025 at 6:13 AM Leonard Xu wrote: > >> Hey, Hao >> >> Please see my comments as follows: >> >> >> 2. I think we can also modify the create catalog ddl syntax. >> > >> > ``` >> > CREATE CATALOG cat USING CONNECTION mycat.mydb.mysql_prod >> > WITH ( >> >'type' = 'jdbc' >> > ); >> > ``` >> > >> > Does `mycat.mydb.mysql_prod` exist in another catalog? This feels like a >> > chicken-egg problem. I think for connections to be used for catalog >> > creation, it needs to be system connection similar to system function >> which >> > exist outside of CatalogManager. Maybe we can defer this to later >> > functionality? >> >> I think connection is a very common need for most catalogs like >> MySQLCatalog、KafkaCatalog、HiveCatalog and so on, all of these catalogs need >> a connection. >> >> >> >> 3. It seems the connector factory should merge the with options and >> > connection options together and then create the source/sink. It's >> > better that framework can merge all these options and connectors don't >> need >> > any codes. >> > >> > I think it's better to separate connections with table options and make >> it >> > explicit. Reasons is: the connection here should be a decrypted one. >> It's >> > sensitive and should be handled carefully regarding logging, usage etc. >> > Mixing with original table options makes it hard to do. But the Flip >> used >> > `CatalogConnection` which is an encrypted one. I'll update it to >> > `SensitveConnection`. >> >> (4)Framework-level Resolution : +1 to Shengkai's point about having the >> > framework (DynamicTableFactory) return complete options to reduce >> connector >> > adaptation cost. >> > >> > Please see my explanation for Shengkai's similar question. >> >> In this case, I think reducing connector development cost is more >> important than making is explicit, the connector knows which options is >> sensitive or not. >> >> >> >> 4. Why we need to introduce ConnectionFactory? I think connection is >> like >> > CatalogTable. It should hold the basic information and all information >> in >> > the connection should be stored into secret store. >> > >> > The main reason is to enable user defined connection handling for >> different >> > types. For example, if connection type is `basic`, the corresponding >> > factory can handle basic type secrets (e.g. extract username/password >> from >> > connection options and do encryption). >> > >> >> (2) Configurability of SECRET_FIELDS : Could the hardcoded >> SECRET_FIELDS >> > in BasicConnectionFactory be made configurable (e.g., 'token' vs >> > 'accessKey') for better connector compatibility? >> > >> > This depends on `ConnectionFactory` implementation and can be self >> defined >> > by user. >> >> I hope the BasicConnectionFactory can be a common one that can feed most >> common users case, otherwis
Re: [DISCUSS] FLIP-539: Support WITH Clause in CREATE FUNCTION Statement in Flink SQL
Hi Dawid, Thanks for the FLIP. +1 to support options for functions as well. I have one question: Do you want to update `createTemporarySystemFunction` in `TableEnvironment` to support `FunctionDescriptor` as well? Thanks, Hao On Tue, Jul 29, 2025 at 10:33 AM Yash Anand wrote: > Hi Dawid, > > Thank you for initiating this FLIP, looks like a useful addition +1 for the > FLIP. > > I have just one question, will the option keys be limited to some > predefined values or it could be any key user wants to add as metadata? > Like if the user wants to add a description for each argument? > > On Tue, Jul 29, 2025 at 7:09 AM Timo Walther wrote: > > > Maybe not in the first version but eventually nothing in the design > > blocks us for supporting this. The SecretStore would need to be > > available in the FunctionDefinitionFactory for this. > > > > Cheers, > > Timo > > > > On 29.07.25 15:08, Ryan van Huuksloot wrote: > > > Overall the FLIP looks good to me. > > > > > > Would the properties support the SecretStore proposed in FLIP-529? > > > > > > Otherwise, +1, thanks! > > > > > > Ryan van Huuksloot > > > Staff Engineer, Infrastructure | Streaming Platform > > > [image: Shopify] > > > < > https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email > > > > > > > > > > > > On Tue, Jul 29, 2025 at 4:19 AM Jacky Lau > wrote: > > > > > >> Thanks for initiating this! > > >> > > >> +1 for this proposal. > > >> > > >> Sergey Nuyanzin 于2025年7月29日周二 15:34写道: > > >> > > >>> Thanks for driving this Dawid. > > >>> > > >>> looks reasonable to me > > >>> > > >>> On Mon, Jul 28, 2025 at 5:03 PM Ramin Gharib > > >>> wrote: > > > > Hello Dawid, > > > > Thanks for initiating this! The FLIP looks well-written. > > The WITH clause brings consistency to existing syntax. > > > > +1 for this proposal. > > > > On Mon, Jul 28, 2025 at 3:33 PM Dawid Wysakowicz < > > >> dwysakow...@apache.org > > > > wrote: > > > > > Hi, > > > I'd like to start a discussion of FLIP-539: Support WITH Clause in > > >>> CREATE > > > FUNCTION Statement in Flink SQL [1]. > > > > > > The existing CREATE FUNCTION statement in Flink SQL allows users > to > > > register user-defined functions (UDFs) by specifying the class name > > >>> and the > > > artifact (JAR) containing the implementation. While this design > > >> covers > > > common use cases, it lacks a declarative mechanism for associating > > > arbitrary properties or metadata with the function at creation > time. > > > > > > Other Flink SQL objects—such as tables—support a WITH clause for > > > specifying options in a key-value fashion, improving consistency, > > > discoverability, and extensibility. > > > Looking forward to comments and suggestions for improvements! > > > > > > Best, > > > Dawid > > > > > > [1] https://cwiki.apache.org/confluence/x/sg9JFg > > > > > >>> > > >>> > > >>> > > >>> -- > > >>> Best regards, > > >>> Sergey > > >>> > > >> > > > > > > > >
[VOTE] FLIP-529: Connections in Flink SQL and TableAPI
Hi everyone, I'd like to start a vote on FLIP-529: Connections in Flink SQL and TableAPI [1], which has been discussed in this thread [2]. The vote will be open for at least 72 hours unless there is an objection or not enough votes. Thanks, Hao [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-529%3A+Connections+in+Flink+SQL+and+TableAPI [2] https://lists.apache.org/thread/np8qx40tmhby7tv3dy71pd75nrc094rc
[jira] [Created] (FLINK-34992) FLIP-437: Support ML Models in Flink SQL
Hao Li created FLINK-34992: -- Summary: FLIP-437: Support ML Models in Flink SQL Key: FLINK-34992 URL: https://issues.apache.org/jira/browse/FLINK-34992 Project: Flink Issue Type: New Feature Reporter: Hao Li This is an umbrella task for FLIP-437. FLIP-437: https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-34993) Support Model CRUD in parser
Hao Li created FLINK-34993: -- Summary: Support Model CRUD in parser Key: FLINK-34993 URL: https://issues.apache.org/jira/browse/FLINK-34993 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35013) Support temporary model
Hao Li created FLINK-35013: -- Summary: Support temporary model Key: FLINK-35013 URL: https://issues.apache.org/jira/browse/FLINK-35013 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35014) SqlNode to operation conversion for models
Hao Li created FLINK-35014: -- Summary: SqlNode to operation conversion for models Key: FLINK-35014 URL: https://issues.apache.org/jira/browse/FLINK-35014 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35016) Catalog changes for model CRUD
Hao Li created FLINK-35016: -- Summary: Catalog changes for model CRUD Key: FLINK-35016 URL: https://issues.apache.org/jira/browse/FLINK-35016 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35018) ML_EVALUATE function
Hao Li created FLINK-35018: -- Summary: ML_EVALUATE function Key: FLINK-35018 URL: https://issues.apache.org/jira/browse/FLINK-35018 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35017) ML_PREDICT function
Hao Li created FLINK-35017: -- Summary: ML_PREDICT function Key: FLINK-35017 URL: https://issues.apache.org/jira/browse/FLINK-35017 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35019) Support show create model syntax
Hao Li created FLINK-35019: -- Summary: Support show create model syntax Key: FLINK-35019 URL: https://issues.apache.org/jira/browse/FLINK-35019 Project: Flink Issue Type: Sub-task Reporter: Hao Li show options in addition to input/output schema -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-35020) Model Catalog implementation in Hive etc
Hao Li created FLINK-35020: -- Summary: Model Catalog implementation in Hive etc Key: FLINK-35020 URL: https://issues.apache.org/jira/browse/FLINK-35020 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37780) ml_predict sql function skeleton
Hao Li created FLINK-37780: -- Summary: ml_predict sql function skeleton Key: FLINK-37780 URL: https://issues.apache.org/jira/browse/FLINK-37780 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37778) model keyword syntax change
Hao Li created FLINK-37778: -- Summary: model keyword syntax change Key: FLINK-37778 URL: https://issues.apache.org/jira/browse/FLINK-37778 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37779) Add getModel to Flink catalog reader etc
Hao Li created FLINK-37779: -- Summary: Add getModel to Flink catalog reader etc Key: FLINK-37779 URL: https://issues.apache.org/jira/browse/FLINK-37779 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37777) FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design
Hao Li created FLINK-3: -- Summary: FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design Key: FLINK-3 URL: https://issues.apache.org/jira/browse/FLINK-3 Project: Flink Issue Type: New Feature Components: Table SQL / API, Table SQL / Planner, Table SQL / Runtime Reporter: Hao Li Implement ml_predict and ml_evaluate functions in https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37799) Flink native model predict runtime implementation
Hao Li created FLINK-37799: -- Summary: Flink native model predict runtime implementation Key: FLINK-37799 URL: https://issues.apache.org/jira/browse/FLINK-37799 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37797) Documentation
Hao Li created FLINK-37797: -- Summary: Documentation Key: FLINK-37797 URL: https://issues.apache.org/jira/browse/FLINK-37797 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37789) Integrate with sqlvalidator
Hao Li created FLINK-37789: -- Summary: Integrate with sqlvalidator Key: FLINK-37789 URL: https://issues.apache.org/jira/browse/FLINK-37789 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37791) Integrate with sql to rel converter
Hao Li created FLINK-37791: -- Summary: Integrate with sql to rel converter Key: FLINK-37791 URL: https://issues.apache.org/jira/browse/FLINK-37791 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37790) Add model factory related interfaces/classes
Hao Li created FLINK-37790: -- Summary: Add model factory related interfaces/classes Key: FLINK-37790 URL: https://issues.apache.org/jira/browse/FLINK-37790 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37792) Physical rewrite for ml_predict
Hao Li created FLINK-37792: -- Summary: Physical rewrite for ml_predict Key: FLINK-37792 URL: https://issues.apache.org/jira/browse/FLINK-37792 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37794) ml_evaluate sql function skeleton
Hao Li created FLINK-37794: -- Summary: ml_evaluate sql function skeleton Key: FLINK-37794 URL: https://issues.apache.org/jira/browse/FLINK-37794 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37795) Physical rewrite for ml_evaluate
Hao Li created FLINK-37795: -- Summary: Physical rewrite for ml_evaluate Key: FLINK-37795 URL: https://issues.apache.org/jira/browse/FLINK-37795 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37793) Codegen for ml_predict
Hao Li created FLINK-37793: -- Summary: Codegen for ml_predict Key: FLINK-37793 URL: https://issues.apache.org/jira/browse/FLINK-37793 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37796) Codegen for ml_evaluate
Hao Li created FLINK-37796: -- Summary: Codegen for ml_evaluate Key: FLINK-37796 URL: https://issues.apache.org/jira/browse/FLINK-37796 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37819) Add column expansion test for model TVF
Hao Li created FLINK-37819: -- Summary: Add column expansion test for model TVF Key: FLINK-37819 URL: https://issues.apache.org/jira/browse/FLINK-37819 Project: Flink Issue Type: Sub-task Reporter: Hao Li https://github.com/apache/flink/pull/26553/files#r2097261694 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37849) Model provider factory discover
Hao Li created FLINK-37849: -- Summary: Model provider factory discover Key: FLINK-37849 URL: https://issues.apache.org/jira/browse/FLINK-37849 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37978) Change max-buffer config to max-concurrent-operations
Hao Li created FLINK-37978: -- Summary: Change max-buffer config to max-concurrent-operations Key: FLINK-37978 URL: https://issues.apache.org/jira/browse/FLINK-37978 Project: Flink Issue Type: Sub-task Reporter: Hao Li Change max-buffer config to max-concurrent-operations -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37929) Support cdc stream for ml_predict
Hao Li created FLINK-37929: -- Summary: Support cdc stream for ml_predict Key: FLINK-37929 URL: https://issues.apache.org/jira/browse/FLINK-37929 Project: Flink Issue Type: Sub-task Reporter: Hao Li Support cdc stream for ml_predict if user can indicate model is deterministic through config -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37928) Support only insert only stream for ml_predict
Hao Li created FLINK-37928: -- Summary: Support only insert only stream for ml_predict Key: FLINK-37928 URL: https://issues.apache.org/jira/browse/FLINK-37928 Project: Flink Issue Type: Sub-task Reporter: Hao Li Since ml_predict isn't deterministic by nature -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37939) Metrics for ml predict function
Hao Li created FLINK-37939: -- Summary: Metrics for ml predict function Key: FLINK-37939 URL: https://issues.apache.org/jira/browse/FLINK-37939 Project: Flink Issue Type: Sub-task Reporter: Hao Li request counter latency gauge or histogram success/failure counter -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-37938) ml evaluate implementation for different task types
Hao Li created FLINK-37938: -- Summary: ml evaluate implementation for different task types Key: FLINK-37938 URL: https://issues.apache.org/jira/browse/FLINK-37938 Project: Flink Issue Type: Sub-task Reporter: Hao Li ml evaluate implementation for different task types -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-38067) Release Testing: Verify FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design
Hao Li created FLINK-38067: -- Summary: Release Testing: Verify FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design Key: FLINK-38067 URL: https://issues.apache.org/jira/browse/FLINK-38067 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-38068) Fix distribution
Hao Li created FLINK-38068: -- Summary: Fix distribution Key: FLINK-38068 URL: https://issues.apache.org/jira/browse/FLINK-38068 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-38066) Release Testing: Verify FLIP-437: Support ML Models in Flink SQL
Hao Li created FLINK-38066: -- Summary: Release Testing: Verify FLIP-437: Support ML Models in Flink SQL Key: FLINK-38066 URL: https://issues.apache.org/jira/browse/FLINK-38066 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-38084) Update model provider download doc
Hao Li created FLINK-38084: -- Summary: Update model provider download doc Key: FLINK-38084 URL: https://issues.apache.org/jira/browse/FLINK-38084 Project: Flink Issue Type: Sub-task Reporter: Hao Li See [https://github.com/apache/flink/pull/26770#discussion_r2194229075.] Add download doc once openai provider jar is published in 2.1 release. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-38104) FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API
Hao Li created FLINK-38104: -- Summary: FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API Key: FLINK-38104 URL: https://issues.apache.org/jira/browse/FLINK-38104 Project: Flink Issue Type: New Feature Reporter: Hao Li Implement https://cwiki.apache.org/confluence/display/FLINK/FLIP-526%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Table+API -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-38020) NonTimeRangeUnboundedPrecedingFunction failing with NPE
Hao Li created FLINK-38020: -- Summary: NonTimeRangeUnboundedPrecedingFunction failing with NPE Key: FLINK-38020 URL: https://issues.apache.org/jira/browse/FLINK-38020 Project: Flink Issue Type: Bug Reporter: Hao Li [https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/over/NonTimeRangeUnboundedPrecedingFunction.java#L425-L432] missing calling `setAccumulator`. [https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/test/java/org/apache/flink/table/runtime/operators/over/NonTimeRangeUnboundedPrecedingFunctionTest.java#L55] didn't catch it because `sum` in [https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/test/java/org/apache/flink/table/runtime/operators/over/SumAggsHandleFunction.java#L30] is long. It will fail with NPE if change it `Long` -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-38117) Add anonymous ContextResolvedModel
Hao Li created FLINK-38117: -- Summary: Add anonymous ContextResolvedModel Key: FLINK-38117 URL: https://issues.apache.org/jira/browse/FLINK-38117 Project: Flink Issue Type: Sub-task Reporter: Hao Li -- This message was sent by Atlassian Jira (v8.20.10#820010)