from:"Hao Li"

[DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-12 Thread Hao Li

Hi, Dev


Mingge, Chris and I would like to start a discussion about FLIP-437:
Support ML Models in Flink SQL.

This FLIP is proposing to support machine learning models in Flink SQL
syntax so that users can CRUD models with Flink SQL and use models on Flink
to do prediction with Flink data. The FLIP also proposes new model entities
and changes to catalog interface to support model CRUD operations in
catalog.

For more details, see FLIP-437 [1]. Looking forward to your feedback.


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL

Thanks,
Minge, Chris & Hao

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-13 Thread Hao Li

Hi Jark,

Thanks for your questions. These are good questions!

1. The polymorphism table function I was referring to takes a table as
input and outputs a table. So the syntax would be like
```
SELECT * FROM ML_PREDICT('model', (SELECT * FROM my_table))
```
As far as I know, this is not supported yet on Flink. So before it's
supported, one option for the predict function is using table function
which can output multiple columns
```
SELECT * FROM my_table, LATERAL VIEW (ML_PREDICT('model', col1, col2))
```

2. Good question. Type inference is hard for the `ML_PREDICT` function
because it takes a model name string as input. I can think of three ways of
doing type inference for it.
   1). Treat `ML_PREDICT` function as something special and during sql
parsing or planning time, if it's encountered, we need to look up the model
from the first argument which is a model name from catalog. Then we can
infer the input/output for the function.
   2). We can define a `model` keyword and use that in the predict function
to indicate the argument refers to a model. So it's like `ML_PREDICT(model
'my_model', col1, col2))`
   3). We can create a special type of table function maybe called
`ModelFunction` which can resolve the model type inference by special
handling it during parsing or planning time.
1) is hacky, 2) isn't supported in Flink for function, 3) might be a
good option.

3. I sketched the `ML_PREDICT` function for inference. But there are
limitations of the function mentioned in 1 and 2. So maybe we don't need to
introduce them as built-in functions until polymorphism table function and
we can properly deal with type inference.
After that, defining a user-defined model function should also be
straightforward.

4. For model types, do you mean 'remote', 'import', 'native' models or
other things?

5. We could support popular providers such as 'azureml', 'vertexai',
'googleai' as long as we support the `ML_PREDICT` function. Users should be
able to implement 3rd-party providers if they can implement a function
handling the input/output for the provider.

I think for the model functions, there are still dependencies or hacks we
need to sort out as a built-in function. Maybe we can separate that as a
follow up if we want to have it built-in and focus on the model syntax for
this FLIP?

Thanks,
Hao

On Tue, Mar 12, 2024 at 10:33 PM Jark Wu  wrote:

> Hi Minge, Chris, Hao,
>
> Thanks for proposing this interesting idea. I think this is a nice step
> towards
> the AI world for Apache Flink. I don't know much about AI/ML, so I may have
> some stupid questions.
>
> 1. Could you tell more about why polymorphism table function (PTF) doesn't
> work and do we have plan to use PTF as model functions?
>
> 2. What kind of object does the model map to in SQL? A relation or a data
> type?
> It looks like a data type because we use it as a parameter of the table
> function.
> If it is a data type, how does it cooperate with type inference[1]?
>
> 3. What built-in model functions will we support? How to define a
> user-defined model function?
>
> 4. What built-in model types will we support? How to define a user-defined
> model type?
>
> 5. Regarding the remote model, what providers will we support? Can users
> implement
> 3rd-party providers except OpenAI?
>
> Best,
> Jark
>
> [1]:
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/udfs/#type-inference
>
>
>
>
> On Wed, 13 Mar 2024 at 05:55, Hao Li  wrote:
>
> > Hi, Dev
> >
> >
> > Mingge, Chris and I would like to start a discussion about FLIP-437:
> > Support ML Models in Flink SQL.
> >
> > This FLIP is proposing to support machine learning models in Flink SQL
> > syntax so that users can CRUD models with Flink SQL and use models on
> Flink
> > to do prediction with Flink data. The FLIP also proposes new model
> entities
> > and changes to catalog interface to support model CRUD operations in
> > catalog.
> >
> > For more details, see FLIP-437 [1]. Looking forward to your feedback.
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL
> >
> > Thanks,
> > Minge, Chris & Hao
> >
>

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-14 Thread Hao Li

Hi Jark,

Thanks for the pointers. It's very helpful.

1. Looks like `tumble`, `hopping` are keywords in calcite parser. And the
syntax `cumulate(Table my_table, ...)` needs to get table information from
catalog somewhere for type validation etc. Can you send me some pointers
where the function gets the table information?
2. The ideal syntax for model function I think would be `ML_PREDICT(MODEL
, {table  | (query_stmt) })`. I think with special
handling of the `ML_PREDICT` function in parser/planner, maybe we can do
this like window functions. But to support `MODEL` keyword, we need calcite
parser change I guess. Also is it possible to support  in
window functions in addiction to table?

For the redshift syntax, I'm not sure the purpose of defining the function
name with the model. Is it to define the function input/output schema? We
have the schema in our create model syntax and the `ML_PREDICT` can handle
it by getting model definition. I think our syntax is more concise to have
a generic prediction function. I also did some research and it's the syntax
used by Databricks `ai_query` [1], Snowflake `predict` [2], Azureml
`predict` [3].

[1]:
https://docs.databricks.com/en/sql/language-manual/functions/ai_query.html
[2]:
https://github.com/Snowflake-Labs/sfguide-intro-to-machine-learning-with-snowpark-ml-for-python/blob/main/3_snowpark_ml_model_training_inference.ipynb?_fsi=sksXUwQ0
[3]:
https://learn.microsoft.com/en-us/sql/machine-learning/tutorials/quickstart-python-train-score-model?view=azuresqldb-mi-current

Thanks,
Hao

On Wed, Mar 13, 2024 at 8:57 PM Jark Wu  wrote:

> Hi Mingge, Hao,
>
> Thanks for your replies.
>
> > PTF is actually the ideal approach for model functions, and we do have
> the plans to use PTF for
> all model functions (including prediction, evaluation etc..) once the PTF
> is supported in FlinkSQL
> confluent extension.
>
> It sounds that PTF is the ideal way and table function is a temporary
> solution which will be dropped in the future.
> I'm not sure whether we can implement it using PTF in Flink SQL. But we
> have implemented window
> functions using PTF[1]. And introduced a new window function (called
> CUMULATE[2]) in Flink SQL based
> on this. I think it might work to use PTF and implement model function
> syntax like this:
>
> SELECT * FROM TABLE(ML_PREDICT(
>   TABLE my_table,
>   my_model,
>   col1,
>   col2
> ));
>
> Besides, did you consider following the way of AWS Redshift which defines
> model function with the model itself together?
> IIUC, a model is a black-box which defines input parameters and output
> parameters which can be modeled into functions.
>
>
> Best,
> Jark
>
> [1]:
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-tvf/#session
> [2]:
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-CumulatingWindows
> [3]:
>
> https://github.com/aws-samples/amazon-redshift-ml-getting-started/blob/main/use-cases/bring-your-own-model-remote-inference/README.md#create-model
>
>
>
>
> On Wed, 13 Mar 2024 at 15:00, Hao Li  wrote:
>
> > Hi Jark,
> >
> > Thanks for your questions. These are good questions!
> >
> > 1. The polymorphism table function I was referring to takes a table as
> > input and outputs a table. So the syntax would be like
> > ```
> > SELECT * FROM ML_PREDICT('model', (SELECT * FROM my_table))
> > ```
> > As far as I know, this is not supported yet on Flink. So before it's
> > supported, one option for the predict function is using table function
> > which can output multiple columns
> > ```
> > SELECT * FROM my_table, LATERAL VIEW (ML_PREDICT('model', col1, col2))
> > ```
> >
> > 2. Good question. Type inference is hard for the `ML_PREDICT` function
> > because it takes a model name string as input. I can think of three ways
> of
> > doing type inference for it.
> >1). Treat `ML_PREDICT` function as something special and during sql
> > parsing or planning time, if it's encountered, we need to look up the
> model
> > from the first argument which is a model name from catalog. Then we can
> > infer the input/output for the function.
> >2). We can define a `model` keyword and use that in the predict
> function
> > to indicate the argument refers to a model. So it's like
> `ML_PREDICT(model
> > 'my_model', col1, col2))`
> >3). We can create a special type of table function maybe called
> > `ModelFunction` which can resolve the model type inference by special
> > handling it during parsing or p

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-14 Thread Hao Li

Hi Jark,

Thanks for the pointer. Sorry for the confusion: I meant how the table name
in window TVF gets translated to `SqlCallingBinding`. Probably we need to
fetch the table definition from the catalog somewhere. Do we treat those
window TVF specially in parser/planner so that catalog is looked up when
they are seen?

For what model is, I'm wondering if it has to be datatype or relation. Can
it be another kind of citizen parallel to datatype/relation/function/db?
Redshift also supports `show models` operation, so it seems it's treated
specially as well? The reasons I don't like Redshift's syntax are:
1. It's a bit verbose, you need to think of a model name as well as a
function name and the function name also needs to be unique.
2. More importantly, prediction function isn't the only function that can
operate on models. There could be a set of inference functions [1] and
evaluation functions [2] which can operate on models. It's hard to specify
all of them in model creation.

[1]:
https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-predict
[2]:
https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate

Thanks,
Hao

On Thu, Mar 14, 2024 at 8:18 PM Jark Wu  wrote:

> Hi Hao,
>
> > Can you send me some pointers
> where the function gets the table information?
>
> Here is the code of cumulate window type checking [1].
>
> > Also is it possible to support  in
> window functions in addiction to table?
>
> Yes. It is not allowed in TVF.
>
> Thanks for the syntax links of other systems. The reason I prefer the
> Redshift way is
> that it avoids introducing Model as a relation or datatype (referenced as a
> parameter in TVF).
> Model is not a relation because it can be queried directly (e.g., SELECT *
> FROM model).
> I'm also confused about making Model as a datatype, because I don't know
> what class the
> model parameter of the eval method of TableFunction/ScalarFunction should
> be. By defining
> the function with the model, users can directly invoke the function without
> reference to the model name.
>
> Best,
> Jark
>
> [1]:
>
> https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/functions/sql/SqlCumulateTableFunction.java#L53
>
> On Fri, 15 Mar 2024 at 02:48, Hao Li  wrote:
>
> > Hi Jark,
> >
> > Thanks for the pointers. It's very helpful.
> >
> > 1. Looks like `tumble`, `hopping` are keywords in calcite parser. And the
> > syntax `cumulate(Table my_table, ...)` needs to get table information
> from
> > catalog somewhere for type validation etc. Can you send me some pointers
> > where the function gets the table information?
> > 2. The ideal syntax for model function I think would be `ML_PREDICT(MODEL
> > , {table  | (query_stmt) })`. I think with
> special
> > handling of the `ML_PREDICT` function in parser/planner, maybe we can do
> > this like window functions. But to support `MODEL` keyword, we need
> calcite
> > parser change I guess. Also is it possible to support  in
> > window functions in addiction to table?
> >
> > For the redshift syntax, I'm not sure the purpose of defining the
> function
> > name with the model. Is it to define the function input/output schema? We
> > have the schema in our create model syntax and the `ML_PREDICT` can
> handle
> > it by getting model definition. I think our syntax is more concise to
> have
> > a generic prediction function. I also did some research and it's the
> syntax
> > used by Databricks `ai_query` [1], Snowflake `predict` [2], Azureml
> > `predict` [3].
> >
> > [1]:
> >
> https://docs.databricks.com/en/sql/language-manual/functions/ai_query.html
> > [2]:
> >
> >
> https://github.com/Snowflake-Labs/sfguide-intro-to-machine-learning-with-snowpark-ml-for-python/blob/main/3_snowpark_ml_model_training_inference.ipynb?_fsi=sksXUwQ0
> > [3]:
> >
> >
> https://learn.microsoft.com/en-us/sql/machine-learning/tutorials/quickstart-python-train-score-model?view=azuresqldb-mi-current
> >
> > Thanks,
> > Hao
> >
> > On Wed, Mar 13, 2024 at 8:57 PM Jark Wu  wrote:
> >
> > > Hi Mingge, Hao,
> > >
> > > Thanks for your replies.
> > >
> > > > PTF is actually the ideal approach for model functions, and we do
> have
> > > the plans to use PTF for
> > > all model functions (including prediction, evaluation etc..) once the
> PTF
> > > is supported in FlinkSQL
> > > confluent extension.
> > >
> > >

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-22 Thread Hao Li

eate a FLIP for just this syntax question and collaborate
> >> with the Calcite community on this. Supporting the syntax of Oracle
> >> shouldn't be too hard to convince at least as parser parameter.
> >>
> >> Regards,
> >> Timo
> >>
> >> [1]
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/%5BWIP%5D+FLIP-440%3A+User-defined+Polymorphic+Table+Functions
> >> [2]
> >>
> >>
> https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_TF.html#GUID-0F66E239-DE77-4C0E-AC76-D5B632AB8072
> >> [3]
> https://oracle-base.com/articles/18c/polymorphic-table-functions-18c
> >>
> >>
> >>
> >> On 20.03.24 17:22, Mingge Deng wrote:
> >> > Thanks Jark for all the insightful comments.
> >> >
> >> > We have updated the proposal per our offline discussions:
> >> > 1. Model will be treated as a new relation in FlinkSQL.
> >> > 2. Include the common ML predict and evaluate functions into the open
> >> > source flink to complete the user journey.
> >> >  And we should be able to extend the calcite SqlTableFunction to
> >> support
> >> > these two ML functions.
> >> >
> >> > Best,
> >> > Mingge
> >> >
> >> > On Mon, Mar 18, 2024 at 7:05 PM Jark Wu  wrote:
> >> >
> >> >> Hi Hao,
> >> >>
> >> >>> I meant how the table name
> >> >> in window TVF gets translated to `SqlCallingBinding`. Probably we
> need
> >> to
> >> >> fetch the table definition from the catalog somewhere. Do we treat
> >> those
> >> >> window TVF specially in parser/planner so that catalog is looked up
> >> when
> >> >> they are seen?
> >> >>
> >> >> The table names are resolved and validated by Calcite SqlValidator.
> We
> >> >> don' need to fetch from catalog manually.
> >> >> The specific checking logic of cumulate window happens in
> >> >> SqlCumulateTableFunction.OperandMetadataImpl#checkOperandTypes.
> >> >> The return type of SqlCumulateTableFunction is defined in
> >> >> #getRowTypeInference() method.
> >> >> Both are public interfaces provided by Calcite and it seems it's not
> >> >> specially handled in parser/planner.
> >> >>
> >> >> I didn't try that, but my gut feeling is that the framework is ready
> to
> >> >> extend a customized TVF.
> >> >>
> >> >>> For what model is, I'm wondering if it has to be datatype or
> relation.
> >> >> Can
> >> >> it be another kind of citizen parallel to
> >> datatype/relation/function/db?
> >> >> Redshift also supports `show models` operation, so it seems it's
> >> treated
> >> >> specially as well?
> >> >>
> >> >> If it is an entity only used in catalog scope (e.g., show xxx, create
> >> xxx,
> >> >> drop xxx), it is fine to introduce it.
> >> >> We have introduced such one before, called Module: "load module",
> "show
> >> >> modules" [1].
> >> >> But if we want to use Model in TVF parameters, it means it has to be
> a
> >> >> relation or datatype, because
> >> >> that is what it only accepts now.
> >> >>
> >> >> Thanks for sharing the reason of preferring TVF instead of Redshift
> >> way. It
> >> >> sounds reasonable to me.
> >> >>
> >> >> Best,
> >> >> Jark
> >> >>
> >> >>   [1]:
> >> >>
> >> >>
> >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/modules/
> >> >>
> >> >> On Fri, 15 Mar 2024 at 13:41, Hao Li 
> wrote:
> >> >>
> >> >>> Hi Jark,
> >> >>>
> >> >>> Thanks for the pointer. Sorry for the confusion: I meant how the
> table
> >> >> name
> >> >>> in window TVF gets translated to `SqlCallingBinding`. Probably we
> >> need to
> >> >>> fetch the table definition from the catalog somewhere. Do we treat
> >> those
> >> >>> window TVF specially in parser/planner so that catalog is looked up
> >> when
> >> >>> they are seen?
> >> &

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-25 Thread Hao Li

Hi Ahmed,

Looks like the feature freeze time for 1.20 release is June 15th. We can
definitely get the model DDL into 1.20. For predict and evaluate functions,
if we can't get into the 1.20 release, we can get them into the 1.21
release for sure.

Thanks,
Hao



On Mon, Mar 25, 2024 at 1:25 AM Timo Walther  wrote:

> Hi Jark and Hao,
>
> thanks for the information, Jark! Great that the Calcite community
> already fixed the problem for us. +1 to adopt the simplified syntax
> asap. Maybe even before we upgrade Calcite (i.e. copy over classes), if
> upgrading Calcite is too much work right now?
>
>  > Is `DESCRIPTOR` a must in the syntax?
>
> Yes, we should still stick to the standard as much as possible and all
> vendors use DESCRIPTOR/COLUMNS for distinuishing columns vs. literal
> arguments. So the final syntax of this discussion would be:
>
>
> SELECT f1, f2, label FROM
>ML_PREDICT(TABLE `my_data`, `classifier_model`, DESCRIPTOR(f1, f2))
>
> SELECT * FROM
>ML_EVALUATE(TABLE `eval_data`, `classifier_model`, DESCRIPTOR(f1, f2))
>
> Please double check if this is implementable with the current stack. I
> fear the parser or validator might not like the "identifier" argument?
>
> Make sure that also these variations are supported:
>
> SELECT f1, f2, label FROM
>ML_PREDICT(
>  TABLE `my_data`,
>  my_cat.my_db.classifier_model,
>  DESCRIPTOR(f1, f2))
>
> SELECT f1, f2, label FROM
>ML_PREDICT(
>  input => TABLE `my_data`,
>  model => my_cat.my_db.classifier_model,
>  args => DESCRIPTOR(f1, f2))
>
> It might be safer and more future proof to wrap a MODEL() function
> around it. This would be more in sync with the standard that actually
> still requires to put a TABLE() around the input argument:
>
> ML_PREDICT(TABLE(`my_data`) PARTITIONED BY c1 ORDERED BY c1, )
>
> So the safest option would be the long-term solution:
>
> SELECT f1, f2, label FROM
>ML_PREDICT(
>  input => TABLE(my_data),
>  model => MODEL(my_cat.my_db.classifier_model),
>  args => DESCRIPTOR(f1, f2))
>
> But I'm fine with this if others have a strong opinion:
>
> SELECT f1, f2, label FROM
>ML_PREDICT(
>  input => TABLE `my_data`,
>  model => my_cat.my_db.classifier_model,
>  args => DESCRIPTOR(f1, f2))
>
> Some feedback for the remainder of the FLIP:
>
> 1) Simplify catalog objects
>
> I would suggest to drop:
> CatalogModel.getModelKind()
> CatalogModel.getModelTask()
>
> A catalog object should fully resemble the DDL. And since the DDL puts
> those properties in the WITH clause, the catalog object should the same
> (i.e. put them into the `getModelOptions()`). Btw renaming this method
> to just `getOptions()` for consistency should be good as well.
> Internally, we can still provide enums for these frequently used
> classes. Similar to what we do in `FactoryUtil` for other frequently
> used options.
>
> Remove `getDescription()` and `getDetailedDescription()`. They were a
> mistake for CatalogTable and should actually be deprecated. They got
> replaced by `getComment()` which is sufficient.
>
> 2) CREATE TEMPORARY MODEL is not supported.
>
> This is an unnecessary restriction. We should support temporary versions
> of these catalog objects as well for consistency. Adding support for
> this should be straightforward.
>
> 3) DESCRIBE | DESC } MODEL [catalog_name.][database_name.]model_name
>
> I would suggest we support `SHOW CREATE MODEL` instead. Similar to `SHOW
> CREATE TABLE`, this should show all properties. If we support `DESCRIBE
> MODEL` it should only list the input parameters similar to `DESCRIBE
> TABLE` only shows the columns (not the WITH clause).
>
> Regards,
> Timo
>
>
> On 23.03.24 13:17, Ahmed Hamdy wrote:
> > Hi everyone,
> > +1 for this proposal, I believe it is very useful to the minimum, It
> would
> > be great even having  "ML_PREDICT" and "ML_EVALUATE" as built-in PTFs in
> > this FLIP as discussed.
> > IIUC this will be included in the 1.20 roadmap?
> > Best Regards
> > Ahmed Hamdy
> >
> >
> > On Fri, 22 Mar 2024 at 23:54, Hao Li  wrote:
> >
> >> Hi Timo and Jark,
> >>
> >> I agree Oracle's syntax seems concise and more descriptive. For the
> >> built-in `ML_PREDICT` and `ML_EVALUATE` functions I agree with Jark we
> can
> >> support them as built-in PTF using `SqlTableFunction` for this FLIP. We
> can
> >> have a different FLIP discussing user defined PTF and adopt that later
> for
> >> model functions later. To summarize, the current propos

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-25 Thread Hao Li

Hi Timo,

> Please double check if this is implementable with the current stack. I
fear the parser or validator might not like the "identifier" argument?

I checked this, currently the validator throws an exception trying to get
the full qualifier name for `classifier_model`. But since
`SqlValidatorImpl` is implemented in Flink, we should be able to fix this.
The only caveator is if not full model path is provided,
the qualifier is interpreted as a column. We should be able to special
handle this by rewriting the `ml_predict` function to add the catalog and
database name in `FlinkCalciteSqlValidator` though.

> SELECT f1, f2, label FROM
   ML_PREDICT(
 TABLE `my_data`,
 my_cat.my_db.classifier_model,
 DESCRIPTOR(f1, f2))

SELECT f1, f2, label FROM
   ML_PREDICT(
 input => TABLE `my_data`,
 model => my_cat.my_db.classifier_model,
 args => DESCRIPTOR(f1, f2))

I verified these can be parsed. The problem is in validator for qualifier
as mentioned above.

> So the safest option would be the long-term solution:

SELECT f1, f2, label FROM
   ML_PREDICT(
 input => TABLE(my_data),
 model => MODEL(my_cat.my_db.classifier_model),
 args => DESCRIPTOR(f1, f2))

`TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't work
since `TABLE` and `MODEL` are already key words in calcite used by `CREATE
TABLE`, `CREATE MODEL`. Changing to `model_name(...)` works and will be
treated as a function.

So I think

SELECT f1, f2, label FROM
   ML_PREDICT(
 input => TABLE `my_data`,
 model => my_cat.my_db.classifier_model,
 args => DESCRIPTOR(f1, f2))
should be fine for now.

For the syntax part:
1). Sounds good. We can drop model task and model kind from the definition.
They can be deduced from the options.

2). Sure. We can add temporary model

3). Make sense. We can use `show create model ` to display all
information and `describe model ` to show input/output schema

Thanks,
Hao

On Mon, Mar 25, 2024 at 3:21 PM Hao Li  wrote:

> Hi Ahmed,
>
> Looks like the feature freeze time for 1.20 release is June 15th. We can
> definitely get the model DDL into 1.20. For predict and evaluate functions,
> if we can't get into the 1.20 release, we can get them into the 1.21
> release for sure.
>
> Thanks,
> Hao
>
>
>
> On Mon, Mar 25, 2024 at 1:25 AM Timo Walther  wrote:
>
>> Hi Jark and Hao,
>>
>> thanks for the information, Jark! Great that the Calcite community
>> already fixed the problem for us. +1 to adopt the simplified syntax
>> asap. Maybe even before we upgrade Calcite (i.e. copy over classes), if
>> upgrading Calcite is too much work right now?
>>
>>  > Is `DESCRIPTOR` a must in the syntax?
>>
>> Yes, we should still stick to the standard as much as possible and all
>> vendors use DESCRIPTOR/COLUMNS for distinuishing columns vs. literal
>> arguments. So the final syntax of this discussion would be:
>>
>>
>> SELECT f1, f2, label FROM
>>ML_PREDICT(TABLE `my_data`, `classifier_model`, DESCRIPTOR(f1, f2))
>>
>> SELECT * FROM
>>ML_EVALUATE(TABLE `eval_data`, `classifier_model`, DESCRIPTOR(f1, f2))
>>
>> Please double check if this is implementable with the current stack. I
>> fear the parser or validator might not like the "identifier" argument?
>>
>> Make sure that also these variations are supported:
>>
>> SELECT f1, f2, label FROM
>>ML_PREDICT(
>>  TABLE `my_data`,
>>  my_cat.my_db.classifier_model,
>>  DESCRIPTOR(f1, f2))
>>
>> SELECT f1, f2, label FROM
>>ML_PREDICT(
>>  input => TABLE `my_data`,
>>  model => my_cat.my_db.classifier_model,
>>  args => DESCRIPTOR(f1, f2))
>>
>> It might be safer and more future proof to wrap a MODEL() function
>> around it. This would be more in sync with the standard that actually
>> still requires to put a TABLE() around the input argument:
>>
>> ML_PREDICT(TABLE(`my_data`) PARTITIONED BY c1 ORDERED BY c1, )
>>
>> So the safest option would be the long-term solution:
>>
>> SELECT f1, f2, label FROM
>>ML_PREDICT(
>>  input => TABLE(my_data),
>>  model => MODEL(my_cat.my_db.classifier_model),
>>  args => DESCRIPTOR(f1, f2))
>>
>> But I'm fine with this if others have a strong opinion:
>>
>> SELECT f1, f2, label FROM
>>ML_PREDICT(
>>  input => TABLE `my_data`,
>>  model => my_cat.my_db.classifier_model,
>>  args => DESCRIPTOR(f1, f2))
>>
>> Some feedback for the remainder of the FLIP:
>>
>> 1) Simplify catalog objects
>>
>> I would suggest to dr

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-26 Thread Hao Li

Hi Timo,

Yeah. For `primary key` and `from table(...)` those are explicitly matched
in parser: [1].

> SELECT f1, f2, label FROM
   ML_PREDICT(
 input => `my_data`,
 model => `my_cat`.`my_db`.`classifier_model`,
 args => DESCRIPTOR(f1, f2));

This named argument syntax looks good to me. It can be supported together
with

SELECT f1, f2, label FROM ML_PREDICT(`my_data`,
`my_cat`.`my_db`.`classifier_model`,DESCRIPTOR(f1, f2));

Sure. Will let you know once updated the FLIP.

[1]
https://github.com/confluentinc/flink/blob/release-1.18-confluent/flink-table/flink-sql-parser/src/main/codegen/includes/parserImpls.ftl#L814

Thanks,
Hao

On Tue, Mar 26, 2024 at 4:15 AM Timo Walther  wrote:

> Hi Hao,
>
>  > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't
>  > work since `TABLE` and `MODEL` are already key words
>
> This argument doesn't count. The parser supports introducing keywords
> that are still non-reserved. For example, this enables using "key" for
> both primary key and a column name:
>
> CREATE TABLE t (i INT PRIMARY KEY NOT ENFORCED)
> WITH ('connector' = 'datagen');
>
> SELECT i AS key FROM t;
>
> I'm sure we will introduce `TABLE(my_data)` eventually as this is what
> the standard dictates. But for now, let's use the most compact syntax
> possible which is also in sync with Oracle.
>
> TLDR: We allow identifiers as arguments for PTFs which are expanded with
> catalog and database if necessary. Those identifier arguments translate
> to catalog lookups for table and models. The ML_ functions will make
> sure that the arguments are of correct type model or table.
>
> SELECT f1, f2, label FROM
>ML_PREDICT(
>  input => `my_data`,
>  model => `my_cat`.`my_db`.`classifier_model`,
>  args => DESCRIPTOR(f1, f2));
>
> So this will allow us to also use in the future:
>
> SELECT * FROM poly_func(table1);
>
> Same support as Oracle [1]. Very concise.
>
> Let me know when you updated the FLIP for a final review before voting.
>
> Do others have additional objections?
>
> Regards,
> Timo
>
> [1]
>
> https://livesql.oracle.com/apex/livesql/file/content_HQK7TYEO0NHSJCDY3LN2ERDV6.html
>
>
>
> On 25.03.24 23:40, Hao Li wrote:
> > Hi Timo,
> >
> >> Please double check if this is implementable with the current stack. I
> > fear the parser or validator might not like the "identifier" argument?
> >
> > I checked this, currently the validator throws an exception trying to get
> > the full qualifier name for `classifier_model`. But since
> > `SqlValidatorImpl` is implemented in Flink, we should be able to fix
> this.
> > The only caveator is if not full model path is provided,
> > the qualifier is interpreted as a column. We should be able to special
> > handle this by rewriting the `ml_predict` function to add the catalog and
> > database name in `FlinkCalciteSqlValidator` though.
> >
> >> SELECT f1, f2, label FROM
> > ML_PREDICT(
> >   TABLE `my_data`,
> >   my_cat.my_db.classifier_model,
> >   DESCRIPTOR(f1, f2))
> >
> > SELECT f1, f2, label FROM
> > ML_PREDICT(
> >   input => TABLE `my_data`,
> >   model => my_cat.my_db.classifier_model,
> >   args => DESCRIPTOR(f1, f2))
> >
> > I verified these can be parsed. The problem is in validator for qualifier
> > as mentioned above.
> >
> >> So the safest option would be the long-term solution:
> >
> > SELECT f1, f2, label FROM
> > ML_PREDICT(
> >   input => TABLE(my_data),
> >   model => MODEL(my_cat.my_db.classifier_model),
> >   args => DESCRIPTOR(f1, f2))
> >
> > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't work
> > since `TABLE` and `MODEL` are already key words in calcite used by
> `CREATE
> > TABLE`, `CREATE MODEL`. Changing to `model_name(...)` works and will be
> > treated as a function.
> >
> > So I think
> >
> > SELECT f1, f2, label FROM
> > ML_PREDICT(
> >       input => TABLE `my_data`,
> >   model => my_cat.my_db.classifier_model,
> >   args => DESCRIPTOR(f1, f2))
> > should be fine for now.
> >
> > For the syntax part:
> > 1). Sounds good. We can drop model task and model kind from the
> definition.
> > They can be deduced from the options.
> >
> > 2). Sure. We can add temporary model
> >
> > 3). Make sense. We can use `show create model ` to display all
> > information and `describe model ` to show input/output

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-27 Thread Hao Li

Hi Jark,

I think we can start with supporting popular model providers such as
openai, azureml, sagemaker for remote models.

Thanks,
Hao

On Tue, Mar 26, 2024 at 8:15 PM Jark Wu  wrote:

> Thanks for the PoC and updating,
>
> The final syntax looks good to me, at least it is a nice and concise first
> step.
>
> SELECT f1, f2, label FROM
>ML_PREDICT(
>  input => `my_data`,
>  model => `my_cat`.`my_db`.`classifier_model`,
>  args => DESCRIPTOR(f1, f2));
>
> Besides, what built-in models will we support in the FLIP? This might be
> important
> because it relates to what use cases can run with the new Flink version out
> of the box.
>
> Best,
> Jark
>
> On Wed, 27 Mar 2024 at 01:10, Hao Li  wrote:
>
> > Hi Timo,
> >
> > Yeah. For `primary key` and `from table(...)` those are explicitly
> matched
> > in parser: [1].
> >
> > > SELECT f1, f2, label FROM
> >ML_PREDICT(
> >  input => `my_data`,
> >  model => `my_cat`.`my_db`.`classifier_model`,
> >  args => DESCRIPTOR(f1, f2));
> >
> > This named argument syntax looks good to me. It can be supported together
> > with
> >
> > SELECT f1, f2, label FROM ML_PREDICT(`my_data`,
> > `my_cat`.`my_db`.`classifier_model`,DESCRIPTOR(f1, f2));
> >
> > Sure. Will let you know once updated the FLIP.
> >
> > [1]
> >
> >
> https://github.com/confluentinc/flink/blob/release-1.18-confluent/flink-table/flink-sql-parser/src/main/codegen/includes/parserImpls.ftl#L814
> >
> > Thanks,
> > Hao
> >
> > On Tue, Mar 26, 2024 at 4:15 AM Timo Walther  wrote:
> >
> > > Hi Hao,
> > >
> > >  > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't
> > >  > work since `TABLE` and `MODEL` are already key words
> > >
> > > This argument doesn't count. The parser supports introducing keywords
> > > that are still non-reserved. For example, this enables using "key" for
> > > both primary key and a column name:
> > >
> > > CREATE TABLE t (i INT PRIMARY KEY NOT ENFORCED)
> > > WITH ('connector' = 'datagen');
> > >
> > > SELECT i AS key FROM t;
> > >
> > > I'm sure we will introduce `TABLE(my_data)` eventually as this is what
> > > the standard dictates. But for now, let's use the most compact syntax
> > > possible which is also in sync with Oracle.
> > >
> > > TLDR: We allow identifiers as arguments for PTFs which are expanded
> with
> > > catalog and database if necessary. Those identifier arguments translate
> > > to catalog lookups for table and models. The ML_ functions will make
> > > sure that the arguments are of correct type model or table.
> > >
> > > SELECT f1, f2, label FROM
> > >ML_PREDICT(
> > >  input => `my_data`,
> > >  model => `my_cat`.`my_db`.`classifier_model`,
> > >  args => DESCRIPTOR(f1, f2));
> > >
> > > So this will allow us to also use in the future:
> > >
> > > SELECT * FROM poly_func(table1);
> > >
> > > Same support as Oracle [1]. Very concise.
> > >
> > > Let me know when you updated the FLIP for a final review before voting.
> > >
> > > Do others have additional objections?
> > >
> > > Regards,
> > > Timo
> > >
> > > [1]
> > >
> > >
> >
> https://livesql.oracle.com/apex/livesql/file/content_HQK7TYEO0NHSJCDY3LN2ERDV6.html
> > >
> > >
> > >
> > > On 25.03.24 23:40, Hao Li wrote:
> > > > Hi Timo,
> > > >
> > > >> Please double check if this is implementable with the current
> stack. I
> > > > fear the parser or validator might not like the "identifier"
> argument?
> > > >
> > > > I checked this, currently the validator throws an exception trying to
> > get
> > > > the full qualifier name for `classifier_model`. But since
> > > > `SqlValidatorImpl` is implemented in Flink, we should be able to fix
> > > this.
> > > > The only caveator is if not full model path is provided,
> > > > the qualifier is interpreted as a column. We should be able to
> special
> > > > handle this by rewriting the `ml_predict` function to add the catalog
> > and
> > > > database name in `FlinkCalciteSqlValidator` though.
> > > >
> > > >> SELECT f1, f2, label FROM
>

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-03-28 Thread Hao Li

Thanks Timo. I'll start a vote tomorrow if no further discussion.

Thanks,
Hao

On Thu, Mar 28, 2024 at 9:33 AM Timo Walther  wrote:

> Hi everyone,
>
> I updated the FLIP according to this discussion.
>
> @Hao Li: Let me know if I made a mistake somewhere. I added some
> additional explaning comments about the new PTF syntax.
>
> There are no further objections from my side. If nobody objects, Hao
> feel free to start the voting tomorrow.
>
> Regards,
> Timo
>
>
> On 28.03.24 16:30, Jark Wu wrote:
> > Thanks, Hao,
> >
> > Sounds good to me.
> >
> > Best,
> > Jark
> >
> > On Thu, 28 Mar 2024 at 01:02, Hao Li  wrote:
> >
> >> Hi Jark,
> >>
> >> I think we can start with supporting popular model providers such as
> >> openai, azureml, sagemaker for remote models.
> >>
> >> Thanks,
> >> Hao
> >>
> >> On Tue, Mar 26, 2024 at 8:15 PM Jark Wu  wrote:
> >>
> >>> Thanks for the PoC and updating,
> >>>
> >>> The final syntax looks good to me, at least it is a nice and concise
> >> first
> >>> step.
> >>>
> >>> SELECT f1, f2, label FROM
> >>> ML_PREDICT(
> >>>   input => `my_data`,
> >>>   model => `my_cat`.`my_db`.`classifier_model`,
> >>>   args => DESCRIPTOR(f1, f2));
> >>>
> >>> Besides, what built-in models will we support in the FLIP? This might
> be
> >>> important
> >>> because it relates to what use cases can run with the new Flink version
> >> out
> >>> of the box.
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> On Wed, 27 Mar 2024 at 01:10, Hao Li  wrote:
> >>>
> >>>> Hi Timo,
> >>>>
> >>>> Yeah. For `primary key` and `from table(...)` those are explicitly
> >>> matched
> >>>> in parser: [1].
> >>>>
> >>>>> SELECT f1, f2, label FROM
> >>>> ML_PREDICT(
> >>>>   input => `my_data`,
> >>>>   model => `my_cat`.`my_db`.`classifier_model`,
> >>>>   args => DESCRIPTOR(f1, f2));
> >>>>
> >>>> This named argument syntax looks good to me. It can be supported
> >> together
> >>>> with
> >>>>
> >>>> SELECT f1, f2, label FROM ML_PREDICT(`my_data`,
> >>>> `my_cat`.`my_db`.`classifier_model`,DESCRIPTOR(f1, f2));
> >>>>
> >>>> Sure. Will let you know once updated the FLIP.
> >>>>
> >>>> [1]
> >>>>
> >>>>
> >>>
> >>
> https://github.com/confluentinc/flink/blob/release-1.18-confluent/flink-table/flink-sql-parser/src/main/codegen/includes/parserImpls.ftl#L814
> >>>>
> >>>> Thanks,
> >>>> Hao
> >>>>
> >>>> On Tue, Mar 26, 2024 at 4:15 AM Timo Walther 
> >> wrote:
> >>>>
> >>>>> Hi Hao,
> >>>>>
> >>>>>   > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)`
> >> doesn't
> >>>>>   > work since `TABLE` and `MODEL` are already key words
> >>>>>
> >>>>> This argument doesn't count. The parser supports introducing keywords
> >>>>> that are still non-reserved. For example, this enables using "key"
> >> for
> >>>>> both primary key and a column name:
> >>>>>
> >>>>> CREATE TABLE t (i INT PRIMARY KEY NOT ENFORCED)
> >>>>> WITH ('connector' = 'datagen');
> >>>>>
> >>>>> SELECT i AS key FROM t;
> >>>>>
> >>>>> I'm sure we will introduce `TABLE(my_data)` eventually as this is
> >> what
> >>>>> the standard dictates. But for now, let's use the most compact syntax
> >>>>> possible which is also in sync with Oracle.
> >>>>>
> >>>>> TLDR: We allow identifiers as arguments for PTFs which are expanded
> >>> with
> >>>>> catalog and database if necessary. Those identifier arguments
> >> translate
> >>>>> to catalog lookups for table and models. The ML_ functions will make
> >>>>> sure that the arguments are of correct type model or table.
> >>>>>
> >

[VOTE] FLIP-437: Support ML Models in Flink SQL

2024-03-29 Thread Hao Li

Hi devs,

I'd like to start a vote on the FLIP-437: Support ML Models in Flink
SQL [1]. The discussion thread is here [2].

The vote will be open for at least 72 hours unless there is an objection or
insufficient votes.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL

[2] https://lists.apache.org/thread/9z94m2bv4w265xb5l2mrnh4lf9m28ccn

Thanks,
Hao

Re: [VOTE] FLIP-437: Support ML Models in Flink SQL

2024-04-03 Thread Hao Li

Thanks David Radley and David Moravek for the comments. I'll reply in the
discussion thread.

Hao

On Wed, Apr 3, 2024 at 5:45 AM David Morávek  wrote:

> +1 (binding)
>
> My only suggestion would be to move Catalog changes into a separate
> interface to allow us to begin with lower stability guarantees. Existing
> Catalogs would be able to opt-in by implementing it. It's a minor thing
> though, overall the FLIP is solid and the direction is pretty exciting.
>
> Best,
> D.
>
> On Wed, Apr 3, 2024 at 2:31 AM David Radley 
> wrote:
>
> > Hi Hao,
> > I don’t think this counts as an objection, I have some comments. I should
> > have put this on the discussion thread earlier but have just got to this.
> > - I suggest we can put a model version in the model resource. Versions
> are
> > notoriously difficult to add later; I don’t think we want to proliferate
> > differently named models as a model mutates. We may want to work with
> > non-latest models.
> > - I see that the model name is the unique identifier. I realise this
> would
> > move away from the Oracle syntax – so may not be feasible short term;
> but I
> > wonder if we can have:
> >  - a uuid as the main identifier and the model name as an attribute.
> > or
> >  - a namespace (or something like a system of origin)
> > to help organise models with the same name.
> > - does the model have an owner? I assume that Flink model resource is the
> > master of the model? I imagine in the future that a model that comes in
> via
> > a new connector could be kept up to date with the external model and
> would
> > not be allowed to be changed by anything other than the connector.
> >
> >Kind regards, David.
> >
> > From: Hao Li 
> > Date: Friday, 29 March 2024 at 16:30
> > To: dev@flink.apache.org 
> > Subject: [EXTERNAL] [VOTE] FLIP-437: Support ML Models in Flink SQL
> > Hi devs,
> >
> > I'd like to start a vote on the FLIP-437: Support ML Models in Flink
> > SQL [1]. The discussion thread is here [2].
> >
> > The vote will be open for at least 72 hours unless there is an objection
> or
> > insufficient votes.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL
> >
> > [2] https://lists.apache.org/thread/9z94m2bv4w265xb5l2mrnh4lf9m28ccn
> >
> > Thanks,
> > Hao
> >
> > Unless otherwise stated above:
> >
> > IBM United Kingdom Limited
> > Registered in England and Wales with number 741598
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> >
>

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

2024-04-03 Thread Hao Li

Cross post David Radley's comments here from voting thread:

> I don’t think this counts as an objection, I have some comments. I should
have put this on the discussion thread earlier but have just got to this.
> - I suggest we can put a model version in the model resource. Versions
are notoriously difficult to add later; I don’t think we want to
proliferate differently named models as a model mutates. We may want to
work with non-latest models.
> - I see that the model name is the unique identifier. I realise this
would move away from the Oracle syntax – so may not be feasible short term;
but I wonder if we can have:
> - a uuid as the main identifier and the model name as an attribute.
> or
>  - a namespace (or something like a system of origin)
> to help organise models with the same name.
> - does the model have an owner? I assume that Flink model resource is the
master of the model? I imagine in the future that a model that comes in via
a new connector could be kept up to date with the external model and would
not be allowed to be changed by anything other than the connector.

Thanks for the comments. I agree supporting the model version is important.
I think we could support versioning without changing the overall syntax by
appending version number/name to the model name. Catalog implementations
can handle the versions. For example,

CREATE MODEL `my-model$1`...

"$1" would imply it's version 1. If no version is provided, we can auto
increment the version if the model name exists already or create the first
version if the model name doesn't exist yet.

As for model ownership, I'm not entirely sure about the use case and how it
should be controlled. It could be controlled from the user side through
ACL/rbac or some way in the catalog I guess. Maybe we can follow up on this
as the requirement or use case becomes more clear.

Cross post David Moravek's comments from voting thread:

> My only suggestion would be to move Catalog changes into a separate
> interface to allow us to begin with lower stability guarantees. Existing
> Catalogs would be able to opt-in by implementing it. It's a minor thing
> though, overall the FLIP is solid and the direction is pretty exciting.

I think it's fine to move model related catalog changes to a separate
interface and let the current catalog interface extend it. As model support
will be built-in in Flink, the current catalog interface will need to
support model CRUD operations. For my own education, can you elaborate more
on how separate interface will allow us to begin with lower stability
guarantees?

Thanks,
Hao


On Thu, Mar 28, 2024 at 10:14 AM Hao Li  wrote:

> Thanks Timo. I'll start a vote tomorrow if no further discussion.
>
> Thanks,
> Hao
>
> On Thu, Mar 28, 2024 at 9:33 AM Timo Walther  wrote:
>
>> Hi everyone,
>>
>> I updated the FLIP according to this discussion.
>>
>> @Hao Li: Let me know if I made a mistake somewhere. I added some
>> additional explaning comments about the new PTF syntax.
>>
>> There are no further objections from my side. If nobody objects, Hao
>> feel free to start the voting tomorrow.
>>
>> Regards,
>> Timo
>>
>>
>> On 28.03.24 16:30, Jark Wu wrote:
>> > Thanks, Hao,
>> >
>> > Sounds good to me.
>> >
>> > Best,
>> > Jark
>> >
>> > On Thu, 28 Mar 2024 at 01:02, Hao Li  wrote:
>> >
>> >> Hi Jark,
>> >>
>> >> I think we can start with supporting popular model providers such as
>> >> openai, azureml, sagemaker for remote models.
>> >>
>> >> Thanks,
>> >> Hao
>> >>
>> >> On Tue, Mar 26, 2024 at 8:15 PM Jark Wu  wrote:
>> >>
>> >>> Thanks for the PoC and updating,
>> >>>
>> >>> The final syntax looks good to me, at least it is a nice and concise
>> >> first
>> >>> step.
>> >>>
>> >>> SELECT f1, f2, label FROM
>> >>> ML_PREDICT(
>> >>>   input => `my_data`,
>> >>>   model => `my_cat`.`my_db`.`classifier_model`,
>> >>>   args => DESCRIPTOR(f1, f2));
>> >>>
>> >>> Besides, what built-in models will we support in the FLIP? This might
>> be
>> >>> important
>> >>> because it relates to what use cases can run with the new Flink
>> version
>> >> out
>> >>> of the box.
>> >>>
>> >>> Best,
>> >>> Jark
>> >>>
>> >>> On Wed, 27 Mar 2024 at 01:10, Hao Li 
>> wrote:
>> >>>
>

Re: [VOTE] FLIP-437: Support ML Models in Flink SQL

2024-04-04 Thread Hao Li

Hi Dev,

Thanks all for voting. I'm closing the vote and the result will be posted
in a separate email.

Thanks,
Hao

On Wed, Apr 3, 2024 at 10:24 AM Hao Li  wrote:

> Thanks David Radley and David Moravek for the comments. I'll reply in the
> discussion thread.
>
> Hao
>
> On Wed, Apr 3, 2024 at 5:45 AM David Morávek  wrote:
>
>> +1 (binding)
>>
>> My only suggestion would be to move Catalog changes into a separate
>> interface to allow us to begin with lower stability guarantees. Existing
>> Catalogs would be able to opt-in by implementing it. It's a minor thing
>> though, overall the FLIP is solid and the direction is pretty exciting.
>>
>> Best,
>> D.
>>
>> On Wed, Apr 3, 2024 at 2:31 AM David Radley 
>> wrote:
>>
>> > Hi Hao,
>> > I don’t think this counts as an objection, I have some comments. I
>> should
>> > have put this on the discussion thread earlier but have just got to
>> this.
>> > - I suggest we can put a model version in the model resource. Versions
>> are
>> > notoriously difficult to add later; I don’t think we want to proliferate
>> > differently named models as a model mutates. We may want to work with
>> > non-latest models.
>> > - I see that the model name is the unique identifier. I realise this
>> would
>> > move away from the Oracle syntax – so may not be feasible short term;
>> but I
>> > wonder if we can have:
>> >  - a uuid as the main identifier and the model name as an attribute.
>> > or
>> >  - a namespace (or something like a system of origin)
>> > to help organise models with the same name.
>> > - does the model have an owner? I assume that Flink model resource is
>> the
>> > master of the model? I imagine in the future that a model that comes in
>> via
>> > a new connector could be kept up to date with the external model and
>> would
>> > not be allowed to be changed by anything other than the connector.
>> >
>> >Kind regards, David.
>> >
>> > From: Hao Li 
>> > Date: Friday, 29 March 2024 at 16:30
>> > To: dev@flink.apache.org 
>> > Subject: [EXTERNAL] [VOTE] FLIP-437: Support ML Models in Flink SQL
>> > Hi devs,
>> >
>> > I'd like to start a vote on the FLIP-437: Support ML Models in Flink
>> > SQL [1]. The discussion thread is here [2].
>> >
>> > The vote will be open for at least 72 hours unless there is an
>> objection or
>> > insufficient votes.
>> >
>> > [1]
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL
>> >
>> > [2] https://lists.apache.org/thread/9z94m2bv4w265xb5l2mrnh4lf9m28ccn
>> >
>> > Thanks,
>> > Hao
>> >
>> > Unless otherwise stated above:
>> >
>> > IBM United Kingdom Limited
>> > Registered in England and Wales with number 741598
>> > Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>> >
>>
>

[RESULT][VOTE] FLIP-437: Support ML Models in Flink SQL

2024-04-04 Thread Hao Li

Hi Dev,

I'm happy to announce that FLIP-437: Support ML Models in Flink SQL [1] has
been accepted with 7 approving votes (6 binding) [2]

Timo Walther (binding)
Jark Wu (binding)
Yu Chen (non-binding)
Piotr Nowojski (binding)
Leonard Xu (binding)
Martijn Visser (binding)
David Moravek (binding)

Thanks,
Hao

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL
[2] https://lists.apache.org/thread/gw1hfnqb05mwrstbtw43yh5tllrscgn6

Re: [DISCUSS] FLIP:507 Add Model DDL methods in TABLE API

2025-02-10 Thread Hao Li

Hi Yash,

+1 for the proposal.

Thanks,
Hao

On Mon, Feb 10, 2025 at 12:31 AM Yanquan Lv  wrote:

> Hi, Yash. Thanks for driving it.
> +1 for this.
>
> > 2025年2月7日 05:28，Yash Anand  写道：
> >
> > Hi all! I would like to open up for discussion a new FLIP-507[1].
> > Motivation This proposal aims to extend model DDL support to the Table
> API,
> > enabling a seamless development experience where users can define,
> manage,
> > and utilize ML models entirely within the Table API ecosystem.
> > Other Model Functions like ml_predict() will be added in the follow up
> > FLIP(s).
> >
> > [1] https://cwiki.apache.org/confluence/x/fIxEF
> >
> >
> > Best regards,
> > Yash Anand
>
>

Re: [DISCUSS] FLIP-517: Better Handling of Dynamic Table Primitives with PTFs

2025-03-24 Thread Hao Li

Thanks Timo for the FLIP! This is a great improvement to the FLINK sql
syntax around tables. I have two clarification questions:

1. For SEARCH_KEY
```
SELECT *
FROM
  t_other,
  LATERAL SEARCH_KEY(
input => t,
on_key => DESCRIPTOR(k),
lookup => t_other.name,
options => MAP[
  'async', 'true',
  'retry-predicate', 'lookup_miss',
  'retry-strategy', 'fixed_delay',
  'fixed-delay'='10s'
]
  )
```
Table `t` needs to be an existing `LookupTableSource` [1], right? And we
will rewrite it to `StreamPhysicalLookupJoin`  [2] or similar operator
during the physical optimization phase.
Also to support passing options, we need to extend `LookupContext` [3] to
have a `getOptions` or `getRuntimeOptions` method?

2. For FROM_CHANGELOG
```
SELECT * FROM FROM_CHANGELOG(s) AS t;
```
Do we need to introduce a `DataStream` resource in sql first?


Hao




[1]
https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java
[2]
https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/nodes/physical/stream/StreamPhysicalLookupJoin.scala#L41
[3]
https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java#L82


On Fri, Mar 21, 2025 at 6:25 AM Timo Walther  wrote:

> Hi everyone,
>
> I would like to start a discussion about FLIP-517: Better Handling of
> Dynamic Table Primitives with PTFs [1].
>
> In the past months, I have spent a significant amount of time with SQL
> semantics and the SQL standard around PTFs, when designing and
> implementing FLIP-440 [2]. For those of you that have not taken a look
> into the standard, the concept of Polymorphic Table Functions (PTF)
> enables syntax for implementing custom SQL operators. In my opinion,
> they are kind of a revolution in the SQL language. PTFs can take scalar
> values, tables, models (in Flink), and column lists as arguments. With
> these primitives, we can further evolve shortcomings in the Flink SQL
> language by leveraging syntax and semantics.
>
> I would like introduce a couple of built-in PTFs with the goal to make
> the handling of dynamic tables easier for users. Once users understand
> how a PTF works, they can easily select from a list of functions to
> approach a table for snapshots, changelogs, or searching.
>
> The FLIP proposes:
>
> SNAPSHOT()
> SEARCH_KEY()
> TO_CHANGELOG()
> FROM_CHANGELOG()
>
> I'm aware that this is a delicate topic, and might lead to controversial
> discussions. I hope with concise naming and syntax the benefit over the
> existing syntax becomes clear.
>
> There are more useful PTFs to come, but those are the ones that I
> currently see as the most fundamental ones to tell a round story around
> Flink SQL.
>
> Looking forward to your feedback.
>
> Thanks,
> Timo
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-517%3A+Better+Handling+of+Dynamic+Table+Primitives+with+PTFs
> [2]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093
>

[DISCUSS] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

2025-04-28 Thread Hao Li

Hi All,

I would like to start a discussion about FLIP-525 [1]: Model ML_PREDICT,
ML_EVALUATE Implementation Design. This FLIP is co-authored with Shengkai
Fang.

This FLIP is a follow up of FLIP-437 [2] to propose the implementation
design for ML_PREDICT and ML_EVALUATE function which were introduced in
FLIP-437.

For more details, see FLIP-525 [1]. Looking forward to your feedback.


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL


Thanks,
Hao

[DISCUSS] FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API

2025-04-28 Thread Hao Li

Hi All,

I would like to start a discussion about FLIP-526 [1]: Model ML_PREDICT,
ML_EVALUATE Table API.

This FLIP is a follow up of FLIP-507 [2] to propose the table api for model
related functions. This FLIP is also closely related to FLIP-525 [3] which
is the proposal for model related function implementation design.

For more details, see FLIP-526 [1]. Looking forward to your feedback.

Thanks,
Hao


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-507%3A+Add+Model+DDL+methods+in+TABLE+API
[3]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design

Re: [DISCUSS] FLIP-520: Simplify StructuredType handling

2025-04-22 Thread Hao Li

Hi Timo,

Thanks for the FLIP. +1 with a few questions:

1. Can `StructuredType` be nested? e.g. `STRUCTURED<'com.example.User',
name STRING, age INT NOT NULL, address STRUCTURED<'com.example.address',
street STRING, zip STRING>>`

2. What's the main reason the class won't be enforced in SQL? Since tables
created in SQL can also be used in Table API, will it come as a surprise if
it's working in SQL and then failing in Table API? What if
`com.example.User` was not validated in SQL when creating table, then the
class was created for something else with different fields and then in
Table api, it's not compatible.

Hao

On Tue, Apr 22, 2025 at 9:39 AM Timo Walther  wrote:

> Hi Arvid, Hi Sergey,
>
> thanks for your feedback. I updated the FLIP accordingly but let me
> answer your questions
> here as well:
>
>  > Are we going to enforce that the name is a valid class name? What is
>  > happening if it's not a correct name?
>  > What are the implications of using a class that is not in the
>  > classpath in Table API? It looks to me that the name is metadata-only
>  > until we try to access the objects directly in Table/DataStream API.
>
> Names are not enforced or validated. They are pure metadata as mentioned
> in Section 2.1. We fallback to Row as the conversion class if the name
> cannot be resolved in the current classpath. So when staying in the SQL
> ecosystem (i.e. not switching to Table API, DataStream API, or UDFs),
> the class must not be present.
>
>  > Should Expressions.objectOf(String, Object... kv); also have an
>  > overload where you can put in the StructuredType in case where
>  > the class is not in the CP?
>
> That makes a lot of sense. I added a DataTypes.STRUCTURED(String,
> Field...) method and a Expressions.objectOf(String, Object...).
>
>  > What is the expected outcome of supplying fewer keys than defined
>  > in the structured type? Are we going to make use of nullability here?
>  > If so, *_INSERT and *_REMOVE may have some use.
>
> Currently, we go with the most conservative approach, which means that
> all keys need to be present. Maybe we can reserve this feature to future
> work and make the logic more lenient.
>
>  > Talking about nullability: Is there some option to make the declared
>  > fields NOT NULL? If so, could you amend one example to show that?
>  > (Grammar? implies that it's not possible)
>
> NOT NULL is supported similar to ROW. I adjusted one of
> the examples.
>
>  > One bigger concern is around the naming. For me, OBJECT is used for
>  > semi-structured types that are open. Your FLIP implies a closed design
>  > and that you want to add an open OBJECT later. I asked ChatGPT about
>  > other DB implementations and it seems like STRUCT is used more often
>  > (see below). So, I'd propose to call it STRUCT<...>, STRUCT_OF, >
>  > structOf, UPDATE_STRUCT, and updateStruct respectively.
>
> Naming is hard. I was also torn between STRUCT, STRUCTURED, or OBJECT.
> In Flink, the ROW type is rather our STRUCT type, because it works fully
> position based. Structured types might be name-based in the future for
> better schema evolution, so they rather model an OBJECT type. This was
> my reason for choosing OBJECT_OF (typed to class name and fixed fields)
> vs. OBJECT (semi-structured without fixed fields). Snowflake also uses
> OBJECT(i INT) (for structured types) and OBJECT (for semi structured
> types).
>
> Also, both structured and semi-structured types can then share functions
> such as UPDATE_OBJECT().
>
> What do others think?
>
> Thanks,
> Timo
>
> On 22.04.25 12:08, Sergey Nuyanzin wrote:
> > Thanks for driving this Timo
> >
> > The FLIP seems reasonable to me
> >
> > I have one minor question/clarification
> > do I understand it correct that after this FLIP we can execute of
> > `typeof` against  result of `OBJECT_OF`
> > for instance
> > SELECT typeof(OBJECT_OF(
> >'com.example.User',
> >'name', 'Bob',
> >'age', 42
> > ));
> >
> > should return `STRUCTURED<'com.example.User', name STRING, age INT>`
> > ?
> >
> > On Tue, Apr 22, 2025 at 10:57 AM Timo Walther 
> wrote:
> >>
> >> Hi everyone,
> >>
> >> I would like to ask again for feedback on this FLIP. It is a rather
> >> small change but with big impact on usability for structured data.
> >>
> >> Are there any objections? Otherwise I would like to continue with voting
> >> soon.
> >>
> >> Thanks,
> >> Timo
> >>
> >> On 10.04.25 07:54, Timo Walther wrote:
> >>> Hi everyone,
> >>>
> >>> I would like to start a discussion about FLIP-520: Simplify
> >>> StructuredType handling [1].
> >>>
> >>> Flink SQL already supports structured types in the engine, serializers,
> >>> UDFs, and connector interfaces. However, currently only Table API was
> >>> able to make use of them. While UDFs can take objects as input and
> >>> return types, it is actually quite inconvenient to use them in
> >>> transformations.
> >>>
> >>> This FLIP fixes some immediate blockers in the use of structured types.
> >>>
> >>> Looking f

Re: [DISCUSS] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

2025-05-05 Thread Hao Li

Hi Yash,

ML_EVALUATE itself will be an `TableAggregateFunction`. We will only
provide one implementation in Flink which will be used in codegen. Only
ML_PREDICT function implementation can be based on providers. Flink will
also provide a default implementation for it.

Thanks,
Hao


On Mon, May 5, 2025 at 8:22 AM Yash Anand 
wrote:

> Hi Hao,
>
> Thanks for the proposal, these are really interesting features to extend
> Flink ML use case.
>
> +1 for the proposal.
>
> I just have one question, since you plan to extend
> SqlMlFunctionTableFunction for both ML functions builtin registrations,
> will ML_EVALUATE be an aggregate function or Table function?
>
> Thanks,
> Yash Anand
>
> On Mon, May 5, 2025 at 4:18 AM Piotr Nowojski 
> wrote:
>
> > Hi,
> >
> > sounds like an interesting feature!
> >
> > Best,
> > Piotrek
> >
> > wt., 29 kwi 2025 o 03:52 Shengkai Fang  napisał(a):
> >
> > > Hi, Hao.
> > >
> > > Thanks for your proposal about ML related functions. This FLIP will
> help
> > > others to implement their own model provider.
> > >
> > > +1 for the proposal.
> > >
> > > Best,
> > > Shengkai
> > >
> > > Hao Li  于2025年4月29日周二 07:22写道：
> > >
> > > > Hi All,
> > > >
> > > > I would like to start a discussion about FLIP-525 [1]: Model
> > ML_PREDICT,
> > > > ML_EVALUATE Implementation Design. This FLIP is co-authored with
> > Shengkai
> > > > Fang.
> > > >
> > > > This FLIP is a follow up of FLIP-437 [2] to propose the
> implementation
> > > > design for ML_PREDICT and ML_EVALUATE function which were introduced
> in
> > > > FLIP-437.
> > > >
> > > > For more details, see FLIP-525 [1]. Looking forward to your feedback.
> > > >
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
> > > > [2]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL
> > > >
> > > >
> > > > Thanks,
> > > > Hao
> > > >
> > >
> >
>

Re: [VOTE] FLIP-520: Simplify StructuredType handling

2025-05-06 Thread Hao Li

+1 (non-binding)

Thanks for driving this!

Hao

On Tue, May 6, 2025 at 8:27 AM Ferenc Csaky 
wrote:

> +1 (binding)
>
> Thanks for driving this!
>
> Best,
> Ferenc
>
>
>
>
> On Tuesday, May 6th, 2025 at 14:50, Sergey Nuyanzin 
> wrote:
>
> >
> >
> > Thank you for driving this
> > +1 (binding)
> >
> > On Tue, May 6, 2025, 12:04 Timo Walther twal...@apache.org wrote:
> >
> > > Hi everyone,
> > >
> > > I'd like to start a vote on FLIP-520: Simplify StructuredType handling
> > > [1] which has been discussed in this thread [2].
> > >
> > > The vote will be open for at least 72 hours unless there is an
> objection
> > > or not enough votes.
> > >
> > > [1]
> > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-520%3A+Simplify+StructuredType+handling
> > > [2] https://lists.apache.org/thread/mjxypb67bonyv2jf6vq4z6ttnwwxkz9c
> > >
> > > Cheers,
> > > Timo
>

Re: [DISCUSS] FLIP-520: Simplify StructuredType handling

2025-04-23 Thread Hao Li

I think Arvid has a good point. Why not define Object type without class
and when you get it in table api, try to cast it to some class? I found
https://docs.oracle.com/javase/1.5.0/docs/guide/jdbc/getstart/mapping.html.
Under `JAVA_OBJECT` type section. They have:

```

ResultSet rs = stmt.executeQuery("SELECT ENGINEERS FROM PERSONNEL");
while (rs.next()) {
Engineer eng = (Engineer)rs.getObject("ENGINEERS");
System.out.println(eng.lastName + ", " + eng.firstName);
}

```

For us, how about add `getFieldAs(int post, Class class)` method in Row
type? Your example:

```

TableEnvironment env = ...

Table t = env.sqlQuery("SELECT OBJECT_OF('com.example.User', 'name', 'Bob',
'age', 42)");

// Tries to resolve `com.example.User` in the classpath, if not present
returns `Row`
t.execute().collect();
```

Will be
```
TableEnvironment env = ...

Table t = env.sqlQuery("SELECT OBJECT_OF('name', 'Bob', 'age', 42)");

// Tries to resolve `com.example.User` in the classpath, if not present
returns `Row`
For (Row row : t.execute().collect()) {
User user = row.getFieldAs(0, User.class);
}
```

For Arvid's question: "However, at that point, why do we actually need
anything beyond ROW?"

Maybe the difference is Row type shouldn't support to be casted as user
defined class but `StructuredType` can be.

Thanks,
Hao

On Wed, Apr 23, 2025 at 2:04 AM Arvid Heise 
wrote:

> Hi Timo,
>
> thanks for addressing my points. I'm not set on using STRUCT et al. but
> wanted to point out the alternatives.
>
> Regarding the attached class name, I have similar confusion to Hao. I
> wonder if Structures types shouldn't be anonymous by default in the sense
> that initially we don't attach a class name to it. As you pointed out, it
> has no real semantics in SQL and we can't validate it.
> Another thing to consider is that if one user creates a table through some
> means and another user wants to consume it, the second user may not have
> access to the class as is. But the user could easily create a compatible
> class on its own.
>
> Consequently, I'm thinking about getting rid of the type at all. Only on
> the edges, we can use conversion to the user types when users actually
> access the ROW:
> * Any table API access that wants to collect results (in your last example
> what is t.execute().collect(); returning? How does that work in the
> multi-user setup sketched above? Wouldn't it be easier that the consumer
> explicitly gives us the POJO type that it expects?)
> * Any DataStream conversion
> * Any UDF
>
> However, at that point, why do we actually need anything beyond ROW?
>
> Best,
>
> Arvid
>
> On Wed, Apr 23, 2025 at 8:52 AM Timo Walther  wrote:
>
> > Hi Hao,
> >
> > 1. Can `StructuredType` be nested?
> >
> > Yes this is supported.
> >
> > 2. What's the main reason the class won't be enforced in SQL?
> >
> > SQL should not care about classes. Within the SQL ecosystem, the SQL
> > engine controls the data serialization and protocols. The SQL engine
> > will not load the class. Classes are a concept of a JVM or Python API
> > endpoint. This also the reason why a SQL ARRAY can be
> > represented as List, long[], Long[]. The latter are only concepts
> > in the target programming language and might look different in Python.
> >
> > Regard,
> > Timo
> >
> >
> > On 22.04.25 23:54, Hao Li wrote:
> > > Hi Timo,
> > >
> > > Thanks for the FLIP. +1 with a few questions:
> > >
> > > 1. Can `StructuredType` be nested? e.g. `STRUCTURED<'com.example.User',
> > > name STRING, age INT NOT NULL, address
> STRUCTURED<'com.example.address',
> > > street STRING, zip STRING>>`
> > >
> > > 2. What's the main reason the class won't be enforced in SQL? Since
> > tables
> > > created in SQL can also be used in Table API, will it come as a
> surprise
> > if
> > > it's working in SQL and then failing in Table API? What if
> > > `com.example.User` was not validated in SQL when creating table, then
> the
> > > class was created for something else with different fields and then in
> > > Table api, it's not compatible.
> > >
> > > Hao
> > >
> > > On Tue, Apr 22, 2025 at 9:39 AM Timo Walther 
> wrote:
> > >
> > >> Hi Arvid, Hi Sergey,
> > >>
> > >> thanks for your feedback. I updated the FLIP accordingly but let me
> > >> answer your questions
> > >> here as wel

Re: [DISCUSS] FLIP-520: Simplify StructuredType handling

2025-04-24 Thread Hao Li

Hi Timo,

Thanks for the clarification. It's very helpful.

For the classpath, I suppose it can also support Python later if it's
called in Python table api? Do we want to indicate if it's Java classpath
or Python class? Or we support a list of classpath which can consist both
Python, Java or other languages later?

Thanks,
Hao

On Thu, Apr 24, 2025 at 5:34 AM Arvid Heise  wrote:

> Hi Timo,
>
> thank you very much for responding. I see that this is just the first step
> to get consistency between SQL and Table API and more work is to come.
>
> I still think that there is some redundancy between STRUCT and ROW but tbh
> I have more issues with ROW than with STRUCT. (What is even the meaning of
> a nested ROW?)
>
> So +1 with your proposal and maybe we can deprecate ROW at some later point
> in time.
>
> Best,
>
> Arvid
>
> On Thu, Apr 24, 2025 at 11:57 AM Timo Walther  wrote:
>
> > Hi Arvid, Hi Hao,
> >
> > thanks for this valuable feedback. Let me clarify a few things before I
> > go into the details.
> >
> > Just to avoid any confusion: the FLIP does not propose introducing the
> > StructuredType. Structured types backed by classes already exist in
> > Flink for years and are already supported in UDFs, Table.collect(),
> > StreamTableEnvironment.toDataStream, and connectors. Structured types
> > have been introduced for a better programmatic story in Table API. They
> > avoid the need for manually defining the full schema at the edges.
> > Manual schema work is annoying and with structured types it is possible
> > to use classes whereever a type is expected.
> >
> > The goal of this FLIP only to bring Table API and SQL closer together.
> > In general, this is only the first step of my larger vision of
> > structured data handling. There are basically 3 kinds of structured
> types:
> >
> > 1) a typed, fixed field struct like STRUCTURED<'Money', i INT, s STRING>
> > 2) an untyped, fixed field struct like STRUCTURED
> > (similar to Snowflake OBJECT(i INT, s STRING))
> > 3) an untyped struct for semi-structured data like STRUCTURED (similar
> > to Snowflake OBJECT)
> >
> > RowType represents 2), StructuredType represents 1) and a future
> > semi-structured type can represent 3) (but out of scope for this FLIP).
> >
> > If we don't support a typed struct, Money(i INT) and User(i INT) are not
> > distinct in SQL. For table.collect() or eval(Row row) in UDFs, it would
> > mean that those need the full schema declaration in order to map to a
> > target type. Structured types avoid all of that and make Table API very
> > powerful.
> >
> > Usually both the UDF and the collect()/toDataStream() are defined in the
> > same Table API program. Thus, the class is usually present in the same
> > classpath and this becomes less of an issue in production. Casting
> > structured types to ROW is also supported.
> >
> > The implementation effort of this FLIP is very low. It's mostly intended
> > to fill missing gaps, no major overhaul of the type system. Also to
> > avoid any backwards compatibility issues.
> >
> > Let me know what you think.
> >
> > Cheers,
> > Timo
> >
> > On 23.04.25 21:27, Hao Li wrote:
> > > I think Arvid has a good point. Why not define Object type without
> class
> > > and when you get it in table api, try to cast it to some class? I found
> > >
> >
> https://docs.oracle.com/javase/1.5.0/docs/guide/jdbc/getstart/mapping.html
> > .
> > > Under `JAVA_OBJECT` type section. They have:
> > >
> > > ```
> > >
> > > ResultSet rs = stmt.executeQuery("SELECT ENGINEERS FROM PERSONNEL");
> > > while (rs.next()) {
> > > Engineer eng = (Engineer)rs.getObject("ENGINEERS");
> > > System.out.println(eng.lastName + ", " + eng.firstName);
> > > }
> > >
> > > ```
> > >
> > > For us, how about add `getFieldAs(int post, Class class)` method in Row
> > > type? Your example:
> > >
> > > ```
> > >
> > > TableEnvironment env = ...
> > >
> > > Table t = env.sqlQuery("SELECT OBJECT_OF('com.example.User', 'name',
> > 'Bob',
> > > 'age', 42)");
> > >
> > > // Tries to resolve `com.example.User` in the classpath, if not present
> > > returns `Row`
> > > t.execute().collect();
> > > ```
> > >
> > > Will be
> > > ```
> > > TableEnvironment env = ...
> &g

Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API

2025-05-02 Thread Hao Li

Thanks Mayank for the proposal. I think it's a great addition to Flink to
define secure connectivity in general for table, model and other resources
later on. +1.

Hao

On Fri, May 2, 2025 at 5:32 AM Gustavo de Morais 
wrote:

> Hi Mayank,
>
> Thanks for the initiative. Looking at the FLIP, this looks like a
> well-thought-out proposal that addresses a clear need for more secure and
> reusable external connections in Flink SQL and Table API. Separating
> connection details would a valuable improvement.
>
> Best Regards,
> Gustavo
>
> Am Fr., 2. Mai 2025 um 07:12 Uhr schrieb Ferenc Csaky
> :
>
> > Hi Mayank,
> >
> > Thank you for starting the discussion! In general, I think such
> > functionality
> > would be a really great addition to Flink.
> >
> > Could you pls. elaborate a bit more one what is the reason of defining a
> > `connection` resource on the database level instead of the catalog level?
> > If I think about `JdbcCatalog`, or `HiveCatalog`, the catalog is in
> 1-to-1
> > mapping with an RDBMS, or a HiveMetastore, so my initial thinking is
> that a
> > `connection` seems more like a catalog level resource.
> >
> > WDYT?
> >
> > Thanks,
> > Ferenc
> >
> >
> >
> > On Tuesday, April 29th, 2025 at 17:08, Mayank Juneja <
> > mayankjunej...@gmail.com> wrote:
> >
> > >
> > >
> > > Hi all,
> > >
> > > I would like to open up for discussion a new FLIP-529 [1].
> > >
> > > Motivation:
> > > Currently, Flink SQL handles external connectivity by defining
> endpoints
> > > and credentials in table configuration. This approach prevents
> > reusability
> > > of these connections and makes table definition less secure by exposing
> > > sensitive information.
> > > We propose the introduction of a new "connection" resource in Flink.
> This
> > > will be a pluggable resource configured with a remote endpoint and
> > > associated access key. Once defined, connections can be reused across
> > table
> > > definitions, and eventually for model definition (as discussed in
> > FLIP-437)
> > > for inference, enabling seamless and secure integration with external
> > > systems.
> > > The connection resource will provide a new, optional way to manage
> > external
> > > connectivity in Flink. Existing methods for table definitions will
> remain
> > > unchanged.
> > >
> > > [1] https://cwiki.apache.org/confluence/x/cYroF
> > >
> > > Best Regards,
> > > Mayank Juneja
> >
>

Re: [VOTE] FLIP-507: Add Model DDL methods in TABLE API

2025-02-19 Thread Hao Li

+1 (non-binding)

Thanks Yash,
Hao

On Tue, Feb 18, 2025 at 10:46 AM Yash Anand 
wrote:

> Hi Everyone,
>
> I'd like to start a vote on FLIP-507: Add Model DDL methods in TABLE API
> [1] which has been discussed in this thread [2].
>
> The vote will be open for at least 72 hours unless there is an objection or
> not enough votes.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-507%3A+Add+Model+DDL+methods+in+TABLE+API
> [2] https://lists.apache.org/thread/w9dt6y1w0yns5j3g4685tstjdg5flvy9
>

Re: [DISCUSS] FLIP-517: Better Handling of Dynamic Table Primitives with PTFs

2025-04-02 Thread Hao Li

Hi Timo,

Any question I have is what's the SEARCH_KEY result schema you have in
mind? Can it output multiple rows for every row in the left table or it
needs to pack the result in a single row as an array?

Thanks,
Hao

On Mon, Mar 24, 2025 at 10:20 AM Hao Li  wrote:

> Thanks Timo for the FLIP! This is a great improvement to the FLINK sql
> syntax around tables. I have two clarification questions:
>
> 1. For SEARCH_KEY
> ```
> SELECT *
> FROM
>   t_other,
>   LATERAL SEARCH_KEY(
> input => t,
> on_key => DESCRIPTOR(k),
> lookup => t_other.name,
> options => MAP[
>   'async', 'true',
>   'retry-predicate', 'lookup_miss',
>   'retry-strategy', 'fixed_delay',
>   'fixed-delay'='10s'
> ]
>   )
> ```
> Table `t` needs to be an existing `LookupTableSource` [1], right? And we
> will rewrite it to `StreamPhysicalLookupJoin`  [2] or similar operator
> during the physical optimization phase.
> Also to support passing options, we need to extend `LookupContext` [3] to
> have a `getOptions` or `getRuntimeOptions` method?
>
> 2. For FROM_CHANGELOG
> ```
> SELECT * FROM FROM_CHANGELOG(s) AS t;
> ```
> Do we need to introduce a `DataStream` resource in sql first?
>
>
> Hao
>
>
>
>
> [1]
>
> https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java
> [2]
>
> https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/nodes/physical/stream/StreamPhysicalLookupJoin.scala#L41
> [3]
>
> https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java#L82
>
>
> On Fri, Mar 21, 2025 at 6:25 AM Timo Walther  wrote:
>
>> Hi everyone,
>>
>> I would like to start a discussion about FLIP-517: Better Handling of
>> Dynamic Table Primitives with PTFs [1].
>>
>> In the past months, I have spent a significant amount of time with SQL
>> semantics and the SQL standard around PTFs, when designing and
>> implementing FLIP-440 [2]. For those of you that have not taken a look
>> into the standard, the concept of Polymorphic Table Functions (PTF)
>> enables syntax for implementing custom SQL operators. In my opinion,
>> they are kind of a revolution in the SQL language. PTFs can take scalar
>> values, tables, models (in Flink), and column lists as arguments. With
>> these primitives, we can further evolve shortcomings in the Flink SQL
>> language by leveraging syntax and semantics.
>>
>> I would like introduce a couple of built-in PTFs with the goal to make
>> the handling of dynamic tables easier for users. Once users understand
>> how a PTF works, they can easily select from a list of functions to
>> approach a table for snapshots, changelogs, or searching.
>>
>> The FLIP proposes:
>>
>> SNAPSHOT()
>> SEARCH_KEY()
>> TO_CHANGELOG()
>> FROM_CHANGELOG()
>>
>> I'm aware that this is a delicate topic, and might lead to controversial
>> discussions. I hope with concise naming and syntax the benefit over the
>> existing syntax becomes clear.
>>
>> There are more useful PTFs to come, but those are the ones that I
>> currently see as the most fundamental ones to tell a round story around
>> Flink SQL.
>>
>> Looking forward to your feedback.
>>
>> Thanks,
>> Timo
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-517%3A+Better+Handling+of+Dynamic+Table+Primitives+with+PTFs
>> [2]
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093
>>
>

Re: [DISCUSS] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

2025-05-07 Thread Hao Li

Hi Ron,

I found these names in other systems:

`task_type` in big query ML [1]
`model_type` in databricks [2]

`task` is more of an abbreviated version from `task_type`.

[1]
https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate
[2]
https://www.databricks.com/blog/2022/04/19/model-evaluation-in-mlflow.html

Thanks,
Hao

On Tue, May 6, 2025 at 10:55 PM Ron Liu  wrote:

> > It's mainly used for model evaluation purposes for `ML_EVALUATE`.
> Different
> loss functions will be used and different metrics will be output for
> `ML_EVALUATE` based on the task option of the model. Task option is not
> necessary if the model
> is not used in `ML_EVALUATE`. `ML_EVALUATE` also has an overloading method
> which can override the task type during evaluation.
>
> From your explanation, I personally feel that it might be more appropriate
> to replace task with a word more suited to the scenario, but of course I
> don't have a good suggestion at the moment, just a suggestion.
>
> Best,
> Ron
>
> Hao Li  于2025年5月7日周三 11:24写道：
>
> > Hi Yunfeng, Ron,
> >
> > Thanks for the feedback.
> >
> > >  it might be better to change the configuration api_key to apikey
> > Make sense. I updated the FLIP.
> >
> > > Why is it necessary to define the task option in the WITH clause of the
> > Model DDL, and what is its purpose?
> > It's mainly used for model evaluation purposes for `ML_EVALUATE`.
> Different
> > loss functions will be used and different metrics will be output for
> > `ML_EVALUATE` based on the task option of the model. Task option is not
> > necessary if the model
> > is not used in `ML_EVALUATE`. `ML_EVALUATE` also has an overloading
> method
> > which can override the task type during evaluation.
> >
> > Apart from evaluation, in the future, if model training is supported in
> > Flink, it can also serve the purpose of how the model can be trained.
> >
> > > About the CatalogModel interface, why does it need `getInputSchema` and
> > `getOutputSchema` methods? What is the role of Schema?
> > Schema is mainly to specific the input and output data type of the model
> > when it's used in prediction. During prediction, `ML_PREDICT` takes
> columns
> > from the input table matching the models input schema types and output
> > columns based on the model's output schema type.
> >
> > > Regarding the ModelProvider interface, what is the role of the copy
> > method?
> > I think it can be useful in the future if we need to copy it during the
> > planning stage and apply mutations to the provider. But it may not be
> used
> > for now. I'm also ok to remove it.
> >
> >
> > Hope this answers your question.
> >
> > Thanks,
> > Hao
> >
> >
> > On Tue, May 6, 2025 at 7:49 PM Ron Liu  wrote:
> >
> > > Hi, Hao
> > >
> > > Thanks for starting this proposal, it's a great feature, +1.
> > >
> > > Since I was missing some context, I went to FLIP-437. Combining these
> two
> > > FLIPs, I have the following three questions:
> > > 1. Why is it necessary to define the task option in the WITH clause of
> > the
> > > Model DDL, and what is its purpose? I understand that one model can
> > support
> > > various types of tasks such as regression, classification, clustering,
> > > etc... But the example you have given gives me the impression that
> model
> > > can only perform a specific type of task, which confuses me. I think
> the
> > > task option is not needed
> > >
> > > 2. About the CatalogModel interface, why does it need `getInputSchema`
> > and
> > > `getOutputSchema` method, What is the role of Schema?
> > >
> > > 3. Regarding the ModelProvider interface, what is the role of the copy
> > > method? Since I don't know much about the implementation details, I'm
> > > curious about what cases need to be copied.
> > >
> > >
> > > Best,
> > > Ron
> > >
> > > Yunfeng Zhou  于2025年5月7日周三 09:33写道：
> > >
> > > > Hi Hao,
> > > >
> > > > Thanks for the FLIP! It provides a clearer guideline for developers
> to
> > > > implement model functions.
> > > >
> > > > One minor comment: it might be better to change the configuration
> > api_key
> > > > to apikey, which corresponds to GlobalConfiguration.SENSITIVE_KEYS.
> > > > Otherwise users’ secrets might be exposed in logs and cause security
> > &g

Re: [VOTE] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

2025-05-11 Thread Hao Li

Hi Dev,

Thanks all for voting. I'm closing the vote and the result will be posted
in a separate email.

Thanks,
Hao

On Fri, May 9, 2025 at 12:43 AM Martijn Visser 
wrote:

> +1 (binding)
>
> On Fri, May 9, 2025 at 9:27 AM Yunfeng Zhou 
> wrote:
>
> > +1(non-binding)
> >
> > Best,
> > Yunfeng
> >
> > > 2025年5月9日 00:01，Hao Li  写道：
> > >
> > > Hi everyone,
> > >
> > > I'd like to start a vote on FLIP-525: Model ML_PREDICT, ML_EVALUATE
> > > Implementation Design [1], which has been discussed in this thread [2].
> > >
> > > The vote will be open for at least 72 hours unless there is an
> objection
> > > or not enough votes.
> > >
> > > Thanks,
> > > Hao
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
> > > [2] https://lists.apache.org/thread/880cxnw9ygsjl3x5w0kxrh4x5fw6mp4x
> >
> >
>

[RESULT][VOTE] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

2025-05-11 Thread Hao Li

Hi Dev,

I'm happy to announce that FLIP-525: Model ML_PREDICT, ML_EVALUATE
Implementation Design [1] has been accepted with 6 approving votes (3
binding) [2]

Yash Anand (non-binding)
Mayank Juneja (non-binding)
Shengkai Fang (binding)
Ron Liu (binding)
Yunfeng Zhou (non-binding)
Martijn Visser (binding)

Thanks,
Hao

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
[2] https://lists.apache.org/thread/by9mpnmk22nws1sk54z7t69802ws2z55

Re: [VOTE] FLIP-516: Multi-Way Join Operator

2025-05-09 Thread Hao Li

+1 (non-binding)

Thanks,
Hao

On Fri, May 9, 2025 at 3:41 AM Arvid Heise  wrote:

> +1 (binding)
>
> Cheers
>
> On Wed, May 7, 2025 at 6:37 PM Gustavo de Morais 
> wrote:
>
> > Hi everyone,
> >
> > I'd like to start voting on FLIP-516: Multi-Way Join Operator [1]. The
> > discussion can be found in this thread [2].
> > The vote will be open for at least 72 hours, unless there is an objection
> > or not enough votes.
> >
> > Thanks,
> >
> > Gustavo de Morais
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-516%3A+Multi-Way+Join+Operator
> >
> > [2] https://lists.apache.org/thread/9b9cy8o2qjt7w2n7j0g4bbrwvy9n61xv
> >
>

[VOTE] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

2025-05-08 Thread Hao Li

Hi everyone,

I'd like to start a vote on FLIP-525: Model ML_PREDICT, ML_EVALUATE
Implementation Design [1], which has been discussed in this thread [2].

The vote will be open for at least 72 hours unless there is an objection
or not enough votes.

Thanks,
Hao

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
[2] https://lists.apache.org/thread/880cxnw9ygsjl3x5w0kxrh4x5fw6mp4x

Re: [DISCUSS] FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API

2025-05-08 Thread Hao Li

Hi Ron,

> whether the predict or evaluate introduced in this FLIP can be
serialized to SQL?

Yes. It can be serialized to SQL. We just need to provide serialization
function in `ModelReferenceExpression` and `ModelSourceQueryOperation`.

Thanks,
Hao


On Thu, May 8, 2025 at 12:09 AM Ron Liu  wrote:
>
> Hi, Hao
>
> +1 for the proposal.
>
> Due to [1][2] having supported serialize table API to SQL, so I've one
> question: whether the predict or evaluate introduced in this FLIP can be
> serialized to SQL?
>
> [1]
>
https://cwiki.apache.org/confluence/display/FLINK/FLIP-393%3A+Make+QueryOperations+SQL+serializable
> [2]
>
https://cwiki.apache.org/confluence/display/FLINK/FLIP-502%3A+QueryOperation+SQL+Serialization+customization
>
>
> Kindly reminder: The FLIP link [1] is error in your first email.
>
> Best,
> Ron
>
> Yash Anand  于2025年5月5日周一 23:24写道：
>
> > Hi, Hao.
> >
> > +1 for the proposal.
> >
> > Thanks
> > Yas
> >
> > On Mon, May 5, 2025 at 4:17 AM Piotr Nowojski 
> > wrote:
> >
> > > Hi,
> > >
> > > sounds like a valuable addition!
> > >
> > > Best,
> > > Piotrek
> > >
> > > wt., 29 kwi 2025 o 03:57 Shengkai Fang  napisał(a):
> > >
> > > > Hi, Hao.
> > > >
> > > > +1 for the proposal.
> > > >
> > > > Best,
> > > > Shengkai
> > > >
> > > > Hao Li  于2025年4月29日周二 07:27写道：
> > > >
> > > > > Hi All,
> > > > >
> > > > > I would like to start a discussion about FLIP-526 [1]: Model
> > > ML_PREDICT,
> > > > > ML_EVALUATE Table API.
> > > > >
> > > > > This FLIP is a follow up of FLIP-507 [2] to propose the table api
for
> > > > model
> > > > > related functions. This FLIP is also closely related to FLIP-525
[3]
> > > > which
> > > > > is the proposal for model related function implementation design.
> > > > >
> > > > > For more details, see FLIP-526 [1]. Looking forward to your
feedback.
> > > > >
> > > > > Thanks,
> > > > > Hao
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
https://cwiki.apache.org/confluence/display/FLINK/FLIP-507%3A+Add+Model+DDL+methods+in+TABLE+API
> > > > > [3]
> > > > >
> > > > >
> > > >
> > >
> >
https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
> > > > >
> > > >
> > >
> >

[VOTE] FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API

2025-05-12 Thread Hao Li

Hi everyone,

I'd like to start a vote on FLIP-526: Model ML_PREDICT, ML_EVALUATE Table
API [1], which has been discussed in this thread [2].

The vote will be open for at least 72 business hours unless there is an
objection
or not enough votes.

Thanks,
Hao

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-526%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Table+API
[2] https://lists.apache.org/thread/hlmm9l6qr3pdhs8oc8ltn4pf0s6tg3x4

Re: [DISCUSS] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

2025-05-06 Thread Hao Li

Hi Yunfeng, Ron,

Thanks for the feedback.

>  it might be better to change the configuration api_key to apikey
Make sense. I updated the FLIP.

> Why is it necessary to define the task option in the WITH clause of the
Model DDL, and what is its purpose?
It's mainly used for model evaluation purposes for `ML_EVALUATE`. Different
loss functions will be used and different metrics will be output for
`ML_EVALUATE` based on the task option of the model. Task option is not
necessary if the model
is not used in `ML_EVALUATE`. `ML_EVALUATE` also has an overloading method
which can override the task type during evaluation.

Apart from evaluation, in the future, if model training is supported in
Flink, it can also serve the purpose of how the model can be trained.

> About the CatalogModel interface, why does it need `getInputSchema` and
`getOutputSchema` methods? What is the role of Schema?
Schema is mainly to specific the input and output data type of the model
when it's used in prediction. During prediction, `ML_PREDICT` takes columns
from the input table matching the models input schema types and output
columns based on the model's output schema type.

> Regarding the ModelProvider interface, what is the role of the copy
method?
I think it can be useful in the future if we need to copy it during the
planning stage and apply mutations to the provider. But it may not be used
for now. I'm also ok to remove it.


Hope this answers your question.

Thanks,
Hao


On Tue, May 6, 2025 at 7:49 PM Ron Liu  wrote:

> Hi, Hao
>
> Thanks for starting this proposal, it's a great feature, +1.
>
> Since I was missing some context, I went to FLIP-437. Combining these two
> FLIPs, I have the following three questions:
> 1. Why is it necessary to define the task option in the WITH clause of the
> Model DDL, and what is its purpose? I understand that one model can support
> various types of tasks such as regression, classification, clustering,
> etc... But the example you have given gives me the impression that model
> can only perform a specific type of task, which confuses me. I think the
> task option is not needed
>
> 2. About the CatalogModel interface, why does it need `getInputSchema` and
> `getOutputSchema` method, What is the role of Schema?
>
> 3. Regarding the ModelProvider interface, what is the role of the copy
> method? Since I don't know much about the implementation details, I'm
> curious about what cases need to be copied.
>
>
> Best,
> Ron
>
> Yunfeng Zhou  于2025年5月7日周三 09:33写道：
>
> > Hi Hao,
> >
> > Thanks for the FLIP! It provides a clearer guideline for developers to
> > implement model functions.
> >
> > One minor comment: it might be better to change the configuration api_key
> > to apikey, which corresponds to GlobalConfiguration.SENSITIVE_KEYS.
> > Otherwise users’ secrets might be exposed in logs and cause security
> risks.
> >
> > Best,
> > Yunfeng
> >
> >
> > > 2025年4月29日 07:22，Hao Li  写道：
> > >
> > > Hi All,
> > >
> > > I would like to start a discussion about FLIP-525 [1]: Model
> ML_PREDICT,
> > > ML_EVALUATE Implementation Design. This FLIP is co-authored with
> Shengkai
> > > Fang.
> > >
> > > This FLIP is a follow up of FLIP-437 [2] to propose the implementation
> > > design for ML_PREDICT and ML_EVALUATE function which were introduced in
> > > FLIP-437.
> > >
> > > For more details, see FLIP-525 [1]. Looking forward to your feedback.
> > >
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
> > > [2]
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL
> > >
> > >
> > > Thanks,
> > > Hao
> >
> >
>

Re: [DISCUSS] FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API

2025-05-11 Thread Hao Li

Hi all,

If there are no more discussions. I will start voting tomorrow.

Thanks,
Hao

On Thu, May 8, 2025 at 10:38 AM Hao Li  wrote:

> Hi Ron,
>
> > whether the predict or evaluate introduced in this FLIP can be
> serialized to SQL?
>
> Yes. It can be serialized to SQL. We just need to provide serialization
> function in `ModelReferenceExpression` and `ModelSourceQueryOperation`.
>
> Thanks,
> Hao
>
>
> On Thu, May 8, 2025 at 12:09 AM Ron Liu  wrote:
> >
> > Hi, Hao
> >
> > +1 for the proposal.
> >
> > Due to [1][2] having supported serialize table API to SQL, so I've one
> > question: whether the predict or evaluate introduced in this FLIP can be
> > serialized to SQL?
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-393%3A+Make+QueryOperations+SQL+serializable
> > [2]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-502%3A+QueryOperation+SQL+Serialization+customization
> >
> >
> > Kindly reminder: The FLIP link [1] is error in your first email.
> >
> > Best,
> > Ron
> >
> > Yash Anand  于2025年5月5日周一 23:24写道：
> >
> > > Hi, Hao.
> > >
> > > +1 for the proposal.
> > >
> > > Thanks
> > > Yas
> > >
> > > On Mon, May 5, 2025 at 4:17 AM Piotr Nowojski 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > sounds like a valuable addition!
> > > >
> > > > Best,
> > > > Piotrek
> > > >
> > > > wt., 29 kwi 2025 o 03:57 Shengkai Fang 
> napisał(a):
> > > >
> > > > > Hi, Hao.
> > > > >
> > > > > +1 for the proposal.
> > > > >
> > > > > Best,
> > > > > Shengkai
> > > > >
> > > > > Hao Li  于2025年4月29日周二 07:27写道：
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I would like to start a discussion about FLIP-526 [1]: Model
> > > > ML_PREDICT,
> > > > > > ML_EVALUATE Table API.
> > > > > >
> > > > > > This FLIP is a follow up of FLIP-507 [2] to propose the table
> api for
> > > > > model
> > > > > > related functions. This FLIP is also closely related to FLIP-525
> [3]
> > > > > which
> > > > > > is the proposal for model related function implementation design.
> > > > > >
> > > > > > For more details, see FLIP-526 [1]. Looking forward to your
> feedback.
> > > > > >
> > > > > > Thanks,
> > > > > > Hao
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
> > > > > > [2]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-507%3A+Add+Model+DDL+methods+in+TABLE+API
> > > > > > [3]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
> > > > > >
> > > > >
> > > >
> > >
>

Re: [DISCUSS] FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API

2025-05-12 Thread Hao Li

Hi Timo,

> I don't fully understand why we need BuiltinFunctionQueryOperation and
BuiltInCallExpression

The main reason is that `CallExpression` and `FunctionQueryOperation` seem
to be for `FunctionDefinition` instead of `SqlFunction`. So I want to make
new classes just for `SqlFunction`. But of course we can add more
information to those existing classes to cater for `SqlFunction` as well. I
agree these are implementation details. I'll remove these function
definitions and we can discuss them in PR.

Thanks,
Hao

On Mon, May 12, 2025 at 7:58 AM Timo Walther  wrote:

> One last thing to mention: I don't fully understand why we need
> BuiltinFunctionQueryOperation and BuiltInCallExpression. But we can
> discuss this in the PR as this is not public API. In any case, we should
> make sure to keep the changes to visitors and concepts in Table API
> small. In the end, most stuff should be reusable from the PTF work.
>
> Cheers,
> Timo
>
>
> On 12.05.25 16:46, Timo Walther wrote:
> > +1 to continue to voting.
> >
> > Thanks,
> > Timo
> >
> > On 11.05.25 18:47, Hao Li wrote:
> >> Hi all,
> >>
> >> If there are no more discussions. I will start voting tomorrow.
> >>
> >> Thanks,
> >> Hao
> >>
> >> On Thu, May 8, 2025 at 10:38 AM Hao Li  wrote:
> >>
> >>> Hi Ron,
> >>>
> >>>> whether the predict or evaluate introduced in this FLIP can be
> >>> serialized to SQL?
> >>>
> >>> Yes. It can be serialized to SQL. We just need to provide serialization
> >>> function in `ModelReferenceExpression` and `ModelSourceQueryOperation`.
> >>>
> >>> Thanks,
> >>> Hao
> >>>
> >>>
> >>> On Thu, May 8, 2025 at 12:09 AM Ron Liu  wrote:
> >>>>
> >>>> Hi, Hao
> >>>>
> >>>> +1 for the proposal.
> >>>>
> >>>> Due to [1][2] having supported serialize table API to SQL, so I've one
> >>>> question: whether the predict or evaluate introduced in this FLIP
> >>>> can be
> >>>> serialized to SQL?
> >>>>
> >>>> [1]
> >>>>
> >>> https://cwiki.apache.org/confluence/display/FLINK/
> >>> FLIP-393%3A+Make+QueryOperations+SQL+serializable
> >>>> [2]
> >>>>
> >>> https://cwiki.apache.org/confluence/display/FLINK/
> >>> FLIP-502%3A+QueryOperation+SQL+Serialization+customization
> >>>>
> >>>>
> >>>> Kindly reminder: The FLIP link [1] is error in your first email.
> >>>>
> >>>> Best,
> >>>> Ron
> >>>>
> >>>> Yash Anand  于2025年5月5日周一 23:24写道：
> >>>>
> >>>>> Hi, Hao.
> >>>>>
> >>>>> +1 for the proposal.
> >>>>>
> >>>>> Thanks
> >>>>> Yas
> >>>>>
> >>>>> On Mon, May 5, 2025 at 4:17 AM Piotr Nowojski 
> >>>>> wrote:
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> sounds like a valuable addition!
> >>>>>>
> >>>>>> Best,
> >>>>>> Piotrek
> >>>>>>
> >>>>>> wt., 29 kwi 2025 o 03:57 Shengkai Fang 
> >>> napisał(a):
> >>>>>>
> >>>>>>> Hi, Hao.
> >>>>>>>
> >>>>>>> +1 for the proposal.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Shengkai
> >>>>>>>
> >>>>>>> Hao Li  于2025年4月29日周二 07:27写道：
> >>>>>>>
> >>>>>>>> Hi All,
> >>>>>>>>
> >>>>>>>> I would like to start a discussion about FLIP-526 [1]: Model
> >>>>>> ML_PREDICT,
> >>>>>>>> ML_EVALUATE Table API.
> >>>>>>>>
> >>>>>>>> This FLIP is a follow up of FLIP-507 [2] to propose the table
> >>> api for
> >>>>>>> model
> >>>>>>>> related functions. This FLIP is also closely related to FLIP-525
> >>> [3]
> >>>>>>> which
> >>>>>>>> is the proposal for model related function implementation design.
> >>>>>>>>
> >>>>>>>> For more details, see FLIP-526 [1]. Looking forward to your
> >>> feedback.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Hao
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>> https://cwiki.apache.org/confluence/display/FLINK/
> >>> FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
> >>>>>>>> [2]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>> https://cwiki.apache.org/confluence/display/FLINK/
> >>> FLIP-507%3A+Add+Model+DDL+methods+in+TABLE+API
> >>>>>>>> [3]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>> https://cwiki.apache.org/confluence/display/FLINK/
> >>> FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> >
>
>

Re: [RESULT][VOTE] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

2025-05-12 Thread Hao Li

Hi Timo,

Thanks for letting me know. I'll reopen the vote. Should we update the FLIP
wiki [1] to include the 72 business hours?

[1]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

Thanks,
Hao

On Mon, May 12, 2025 at 5:50 AM Timo Walther  wrote:

> Hi Hao,
>
> please note that 72 hours in VOTE threads refers to business days so
> excluding weekends. This vote should be open for at least 1 more day.
>
> Regards,
> Timo
>
>
>
> On 11.05.25 18:19, Hao Li wrote:
> > Hi Dev,
> >
> > I'm happy to announce that FLIP-525: Model ML_PREDICT, ML_EVALUATE
> > Implementation Design [1] has been accepted with 6 approving votes (3
> > binding) [2]
> >
> > Yash Anand (non-binding)
> > Mayank Juneja (non-binding)
> > Shengkai Fang (binding)
> > Ron Liu (binding)
> > Yunfeng Zhou (non-binding)
> > Martijn Visser (binding)
> >
> > Thanks,
> > Hao
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
> > [2] https://lists.apache.org/thread/by9mpnmk22nws1sk54z7t69802ws2z55
> >
>
>

Re: [VOTE] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

2025-05-12 Thread Hao Li

Hi all,

I was reminded that 72 hours means business days. So the vote is still open
:) Sorry about the confusion.

Thanks,
Hao

On Mon, May 12, 2025 at 1:18 AM Sergey Nuyanzin  wrote:

> +1 (binding)
>
> On Mon, May 12, 2025 at 9:55 AM Piotr Nowojski 
> wrote:
> >
> > +1 (binding)
> >
> > Best,
> > Piotrek
> >
> > niedz., 11 maj 2025 o 18:14 Hao Li 
> napisał(a):
> >
> > > Hi Dev,
> > >
> > > Thanks all for voting. I'm closing the vote and the result will be
> posted
> > > in a separate email.
> > >
> > > Thanks,
> > > Hao
> > >
> > > On Fri, May 9, 2025 at 12:43 AM Martijn Visser <
> martijnvis...@apache.org>
> > > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > On Fri, May 9, 2025 at 9:27 AM Yunfeng Zhou <
> flink.zhouyunf...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > +1(non-binding)
> > > > >
> > > > > Best,
> > > > > Yunfeng
> > > > >
> > > > > > 2025年5月9日 00:01，Hao Li  写道：
> > > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > I'd like to start a vote on FLIP-525: Model ML_PREDICT,
> ML_EVALUATE
> > > > > > Implementation Design [1], which has been discussed in this
> thread
> > > [2].
> > > > > >
> > > > > > The vote will be open for at least 72 hours unless there is an
> > > > objection
> > > > > > or not enough votes.
> > > > > >
> > > > > > Thanks,
> > > > > > Hao
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
> > > > > > [2]
> https://lists.apache.org/thread/880cxnw9ygsjl3x5w0kxrh4x5fw6mp4x
> > > > >
> > > > >
> > > >
> > >
>
>
>
> --
> Best regards,
> Sergey
>

Re: [RESULT][VOTE] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

2025-05-13 Thread Hao Li

Hi Dev,

I'm happy to announce that FLIP-525: Model ML_PREDICT, ML_EVALUATE
Implementation Design [1] has been accepted with 8 approving votes (5
binding) [2]. There are no disapproving votes.

Yash Anand (non-binding)
Mayank Juneja (non-binding)
Shengkai Fang (binding)
Ron Liu (binding)
Yunfeng Zhou (non-binding)
Martijn Visser (binding)
Piotr Nowojski (binding)
Sergey Nuyanzin (binding)


Thanks,
Hao

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
[2] https://lists.apache.org/thread/by9mpnmk22nws1sk54z7t69802ws2z55

On Mon, May 12, 2025 at 9:45 AM Hao Li  wrote:

> Hi Timo,
>
> Thanks for letting me know. I'll reopen the vote. Should we update the
> FLIP wiki [1] to include the 72 business hours?
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>
> Thanks,
> Hao
>
> On Mon, May 12, 2025 at 5:50 AM Timo Walther  wrote:
>
>> Hi Hao,
>>
>> please note that 72 hours in VOTE threads refers to business days so
>> excluding weekends. This vote should be open for at least 1 more day.
>>
>> Regards,
>> Timo
>>
>>
>>
>> On 11.05.25 18:19, Hao Li wrote:
>> > Hi Dev,
>> >
>> > I'm happy to announce that FLIP-525: Model ML_PREDICT, ML_EVALUATE
>> > Implementation Design [1] has been accepted with 6 approving votes (3
>> > binding) [2]
>> >
>> > Yash Anand (non-binding)
>> > Mayank Juneja (non-binding)
>> > Shengkai Fang (binding)
>> > Ron Liu (binding)
>> > Yunfeng Zhou (non-binding)
>> > Martijn Visser (binding)
>> >
>> > Thanks,
>> > Hao
>> >
>> > [1]
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
>> > [2] https://lists.apache.org/thread/by9mpnmk22nws1sk54z7t69802ws2z55
>> >
>>
>>

Re: [VOTE] FLIP-530: Dynamic job configuration

2025-05-20 Thread Hao Li

+1 (non-binding)

Thanks,
Hao

On Tue, May 20, 2025 at 12:39 PM Piotr Nowojski 
wrote:

> Hi,
>
> +1 (binding)
>
> Best, Piotrek
>
> wt., 20 maj 2025 o 20:29 Roman Khachatryan  napisał(a):
>
> > Hi everyone,
> >
> > I'd like to start a vote on FLIP-530: Dynamic job configuration
> > [1] which has been discussed in this thread [2].
> >
> > The vote will be open for at least 72 hours unless there is an objection
> > or not enough votes.
> >
> > [1]
> > https://cwiki.apache.org/confluence/x/uglKFQ
> >
> > [2]
> > https://lists.apache.org/thread/w1m420jx6h5cjv4rfy229xs00mmn7pwg
> >
> > Regards,
> > Roman
> >
>

Re: [DISCUSS] FLIP-531: Initiate Flink Agents as a new Sub-Project

2025-05-21 Thread Hao Li

Hi Xintong, Sean and Chris,

Thanks for driving the initiative. Very exciting to bring AI Agent to Flink
to empower the streaming use cases.

+1 to the FLIP.

Thanks,
Hao

On Wed, May 21, 2025 at 7:35 AM Nishita Pattanayak <
nishita.pattana...@gmail.com> wrote:

> Hi Sean, Chris and Xintong. This seems to be a very exciting sub-project.
> +1 for "flink-agents" sub-project.
>
> I was going through the FLIP , and had some questions regarding the same:
> 1. How would the external model calls (e.g., OpenAI or internal LLMs)
> integrated into Flink tasks without introducing backpressure or latency
> issues?
> In my experience, calling an external LLM has the following
> risks: Latency-sensitive (LLM inference can take hundreds of milliseconds
> to seconds), Flaky (network issues, rate limits) as well as it
> is Non-deterministic (with timeouts, retries, etc.). It would be great to
> work/brainstorm on how we solve these issues.
> 2. In traditional agent workflows, user feedback often plays a key role in
> validating and improving agent outputs. In a continuous, long-running
> Flink-based agent system, where interactions might not be user-facing or
> synchronous, how do we incorporate human-in-the-loop feedback or
> correctness signals to validate and iteratively improve agent behavior?
>
> This is a really exciting direction for the Flink ecosystem. The idea of
> building long-running, context-aware agents natively on Flink feels like a
> natural evolution of stream processing. I'd love to see this mature and
> would be excited to contribute in any way I can to help productionize and
> validate this in real-world use cases.
>
> On Wed, May 21, 2025 at 8:52 AM Xintong Song 
> wrote:
>
> > Hi devs,
> >
> > Sean, Chris and I would like to start a discussion on FLIP-531 [1], about
> > introducing a new sub-project, Flink Agents.
> >
> > With the rise of agentic AI, we have identified great new opportunities
> for
> > Flink, particularly in the system-triggered agent scenarios. We believe
> the
> > future of AI agent applications is industrialized, where agents will not
> > only be triggered by users, but increasingly by systems as well. Flink's
> > event capabilities in real-time distributed event processing, state
> > management and exact-once consistency fault tolerance make it well-suited
> > as a framework for building such system-triggered agents. Furthermore,
> > system-triggered agents are often tightly coupled with data processing.
> > Flink's outstanding data processing capabilities allows seamless
> > integration between data and agentic processing. These capabilities
> > differentiate Flink from other agent frameworks with unique advantages in
> > the context of system-triggered agents.
> >
> > We propose this effort as a sub-project of Apache Flink, with a separate
> > code repository and lightweight developing process, for rapid iteration
> > during the early stage.
> >
> > Please note that this FLIP is focused on the high-level plans, including
> > motivation, positioning, goals, roadmap, and operating model of the
> > project. Detailed technical design is out of the scope and will be
> > discussed during the rapid prototyping and iterations.
> >
> > For more details, please check the FLIP [1]. Looking forward to your
> > feedback.
> >
> > Best,
> >
> > Xintong
> >
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-531%3A+Initiate+Flink+Agents+as+a+new+Sub-Peoject
> >
>

Re: [VOTE] FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API

2025-05-30 Thread Hao Li

Hi Dev,

Thanks all for voting. I'm closing the vote and the result will be posted
in a separate email.

Thanks,
Hao

On Fri, May 30, 2025 at 12:01 AM Robert Metzger  wrote:

> +1 (binding)
>
> On Fri, May 30, 2025 at 8:55 AM Piotr Nowojski 
> wrote:
>
> > +1 (binding)
> >
> > Best,
> > Piotrek
> >
> > czw., 29 maj 2025 o 05:58 Shengkai Fang  napisał(a):
> >
> > > +1(binding)
> > >
> > > Best,
> > > Shengkai
> > >
> > > Timo Walther  于2025年5月29日周四 00:00写道：
> > >
> > > > +1 (binding)
> > > >
> > > > We should still check whether changes to QueryOperationVisitor are
> > > > necessary but this is internal API and should not the FLIP. The
> public
> > > > API looks correct to me.
> > > >
> > > > Cheers,
> > > > Timo
> > > >
> > > >
> > > > On 12.05.25 18:50, Hao Li wrote:
> > > > > Hi everyone,
> > > > >
> > > > > I'd like to start a vote on FLIP-526: Model ML_PREDICT, ML_EVALUATE
> > > Table
> > > > > API [1], which has been discussed in this thread [2].
> > > > >
> > > > > The vote will be open for at least 72 business hours unless there
> is
> > an
> > > > > objection
> > > > > or not enough votes.
> > > > >
> > > > > Thanks,
> > > > > Hao
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-526%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Table+API
> > > > > [2]
> https://lists.apache.org/thread/hlmm9l6qr3pdhs8oc8ltn4pf0s6tg3x4
> > > > >
> > > >
> > > >
> > >
> >
>

[RESULT][VOTE] FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API

2025-05-30 Thread Hao Li

Hi Dev,

I'm happy to announce that FLIP-526: Model ML_PREDICT, ML_EVALUATE Table
API [1] has been accepted with 4 approving votes (4 binding) [2].

Timo Walther (binding)
Shengkai Fang (binding)
Piotr Nowojski (binding)
Robert Metzger (binding)

Thanks,
Hao

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-526%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Table+API
[2] https://lists.apache.org/thread/5z18ff83oljdvfl86h0w5ovnvy5h2h51

Re: [VOTE] FLIP-531: Initiate Flink Agents as a new Sub-Project

2025-06-02 Thread Hao Li

+1 (non-binding)

Thanks,
Hao

On Mon, Jun 2, 2025 at 11:50 AM Jiaming Xu  wrote:

> +1 (non-binding)
>
> > On Jun 2, 2025, at 11:03, Mingge Deng 
> wrote:
> >
> > +1
> >
> > On Mon, Jun 2, 2025 at 7:53 AM Yuan Mei  wrote:
> >
> >> +1 (binding)
> >>
> >> On Mon, Jun 2, 2025 at 5:07 PM Martijn Visser  >
> >> wrote:
> >>
> >>> +1 (binding)
> >>>
> >>> On Mon, Jun 2, 2025 at 9:07 AM Robert Metzger 
> >> wrote:
> >>>
>  +1 (binding)
> 
> 
>  On Mon, Jun 2, 2025 at 8:38 AM Jing Ge 
> >>> wrote:
> 
> > +1(binding)
> >
> > Best regards,
> > Jing
> >
> > On Mon, Jun 2, 2025 at 3:31 AM Xintong Song 
>  wrote:
> >
> >> Hi everyone,
> >>
> >> I'd like to start a vote on FLIP-531: Initiate Flink Agents as a
> >> new
> >> Sub-Project [1], which has been discussed in this thread [2].
> >>
> >> The vote will be open for at least 72 hours.
> >>
> >>
> >> Best,
> >>
> >> Xintong
> >>
> >>
> >> [1]
> >>
> >>
> >
> 
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-531%3A+Initiate+Flink+Agents+as+a+new+Sub-Project
> >>
> >> [2]
> >> https://lists.apache.org/thread/p9m7wvtlw7sc2oh7v0jk3hb60x5x53yy
> >>
> >
> 
> >>>
> >>
>
>

Re: [DISCUSS] FLIP-440: Annotation naming for PTFs

2025-06-19 Thread Hao Li

+1 for option A. It's consistent with calcite naming.

On Thu, Jun 19, 2025 at 11:38 AM Sergey Nuyanzin 
wrote:

> Thanks for raising this
> +1 for option A
>
> On Thu, Jun 19, 2025 at 4:05 PM Gustavo de Morais
>  wrote:
> >
> > Hi Timo,
> >
> > +1 (non-binding) for option A. Thanks for trying to address feedback
> > quickly.
> >
> > Kind regards,
> > Gustavo de Morais
> >
> > On Thu, 19 Jun 2025 at 15:51, Timo Walther  wrote:
> >
> > > Hi everyone,
> > >
> > > I'm currently polishing FLIP-440, I would like to apply some last
> minute
> > > changes before the first release of PTFs for Flink 2.1. I've already
> > > collected initial user feedback and it seems that the name for
> > > annotations of table arguments is not precise enough.
> > >
> > > As always, naming is a hard problem in software engineering.
> > >
> > > For background, please take a look at this docs section:
> > >
> > >
> > >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/ptfs/#table-semantics-and-virtual-processors
> > >
> > > Currently, a PTF signature can look like when taking a table as an
> > > argument:
> > >
> > > // Untyped with set semantics
> > > eval(@ArgumentHint(TABLE_AS_SET) Row order);
> > >
> > > // Typed with set semantics
> > > eval(@ArgumentHint(TABLE_AS_SET) Order order);
> > >
> > > // Untyped with row semantics
> > > eval(@ArgumentHint(TABLE_AS_ROW) Row order);
> > >
> > > // Typed with row semantics
> > > eval(@ArgumentHint(TABLE_AS_ROW) Order order);
> > >
> > > The annotation value confuses people, so I would ask for renaming this
> > > part of the API.
> > >
> > > Option A:
> > > ROW_SEMANTIC_TABLE
> > > SET_SEMANTIC_TABLE
> > >
> > > Option B:
> > > ROW_WISE_TABLE
> > > SET_WISE_TABLE
> > >
> > > Option C:
> > > ROW_SCOPED_TABLE
> > > SET_SCOPED_TABLE
> > >
> > > Option D:
> > > KEYED_TABLE
> > > UNKEYED_TABLE
> > >
> > > Option E:
> > > PARTITIONED_TABLE
> > > ROW_WISE_TABLE
> > >
> > > Option A/B/C are closer to SQL standard and not too far away from
> > > current docs. Option D is closer to Flink DataStream API but could be
> > > confusing if no PARTITION BY clause is given but still the table could
> > > be keyed. Option E neither takes SQL standard nor DataStream API as a
> > > reference.
> > >
> > > Personally, I would vote for Option A.
> > >
> > > Looking forward to your opinion.
> > >
> > > Cheers,
> > > Timo
> > >
> > >
> > >
>
>
>
> --
> Best regards,
> Sergey
>

Re: [VOTE] FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

2025-06-18 Thread Hao Li

Hi all,

As part of the discussion in AsyncTableFunction PR [1], it's suggested to
change the config name from `table.exec.async-table.buffer-capacity` to
`table.exec.async-table.max-concurrent-operations`. I think the change
makes sense and to make the model async config name
consistent with async table, I'm proposing to change the proposed config in
this FLIP-525 from `table.exec.async-ml-predict.buffer-capacity` to
`table.exec.async-ml-predict.max-concurrent-operations`.

What do people think? If there's no objection, I'll go ahead and make the
change in FLIP and code since the feature isn't released yet.

Thanks,
Hao

[1] https://github.com/apache/flink/pull/26567/files#r2144129856

On Mon, May 12, 2025 at 9:47 AM Hao Li  wrote:

> Hi all,
>
> I was reminded that 72 hours means business days. So the vote is still
> open :) Sorry about the confusion.
>
> Thanks,
> Hao
>
> On Mon, May 12, 2025 at 1:18 AM Sergey Nuyanzin 
> wrote:
>
>> +1 (binding)
>>
>> On Mon, May 12, 2025 at 9:55 AM Piotr Nowojski 
>> wrote:
>> >
>> > +1 (binding)
>> >
>> > Best,
>> > Piotrek
>> >
>> > niedz., 11 maj 2025 o 18:14 Hao Li 
>> napisał(a):
>> >
>> > > Hi Dev,
>> > >
>> > > Thanks all for voting. I'm closing the vote and the result will be
>> posted
>> > > in a separate email.
>> > >
>> > > Thanks,
>> > > Hao
>> > >
>> > > On Fri, May 9, 2025 at 12:43 AM Martijn Visser <
>> martijnvis...@apache.org>
>> > > wrote:
>> > >
>> > > > +1 (binding)
>> > > >
>> > > > On Fri, May 9, 2025 at 9:27 AM Yunfeng Zhou <
>> flink.zhouyunf...@gmail.com
>> > > >
>> > > > wrote:
>> > > >
>> > > > > +1(non-binding)
>> > > > >
>> > > > > Best,
>> > > > > Yunfeng
>> > > > >
>> > > > > > 2025年5月9日 00:01，Hao Li  写道：
>> > > > > >
>> > > > > > Hi everyone,
>> > > > > >
>> > > > > > I'd like to start a vote on FLIP-525: Model ML_PREDICT,
>> ML_EVALUATE
>> > > > > > Implementation Design [1], which has been discussed in this
>> thread
>> > > [2].
>> > > > > >
>> > > > > > The vote will be open for at least 72 hours unless there is an
>> > > > objection
>> > > > > > or not enough votes.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Hao
>> > > > > >
>> > > > > > [1]
>> > > > > >
>> > > > >
>> > > >
>> > >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design
>> > > > > > [2]
>> https://lists.apache.org/thread/880cxnw9ygsjl3x5w0kxrh4x5fw6mp4x
>> > > > >
>> > > > >
>> > > >
>> > >
>>
>>
>>
>> --
>> Best regards,
>> Sergey
>>
>

Re: [ANNOUNCE] Flink 2.1 feature freeze

2025-06-20 Thread Hao Li

Hi Ron,

For FLIP-525, as discussed in [1], there's a config name change. I have a
PR [2] for it which needs to be merged into 2.1 release.

Thanks,
Hao

[1] https://lists.apache.org/thread/t979kpqs8v6zg5t8bgql3wwf27t7th9b
[2] https://github.com/apache/flink/pull/26706

On Fri, Jun 20, 2025 at 7:56 PM Ron Liu  wrote:

> Hi, dev
>
> The feature freeze of 2.1 has started now. That means that no new features
> or improvements should now be merged into the master branch unless you ask
> the release managers first, which has already been done for PRs, or pending
> on CI to pass. Bug fixes and documentation PRs can still be merged.
>
>
> - *Cutting release branch*
>
> Currently we have no blocker issues(beside tickets that used for
> release-testing).
>
> We are planning to cut the release branch on next Wednesday (June 25) if
> no new test instabilities, and we'll make another announcement in the
> dev mailing list then.
>
>
> - *Cross-team testing*
>
> The release testing will start right after we cut the release branch, which
> is expected to come in the next week. As a prerequisite of it, please
> complete the documentation of your new feature and mark the related JIRA
> issue in the 2.1 release wiki page [1] before we start testing, which
> would be quite helpful for other developers to validate your features.
>
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/2.1+Release
>
> Best,
> Ron
>

Re: [SUMMARY] Flink 2.1 Release Sync 06/04/2025

2025-06-05 Thread Hao Li

Hi Ron,

Thanks for driving this. Can you also add FLIP-507 [1] to 2.1 release?
There's one PR for this FLIP and it can be finished before the freeze.

Thanks,
Hao

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-507%3A+Add+Model+DDL+methods+in+TABLE+API

On Wed, Jun 4, 2025 at 12:44 AM Ron Liu  wrote:

> Hi, devs,
>
> This is the fourth meeting for Flink 2.1 release cycle. I'd like to share
> the information synced in the meeting.
>
>
> - Feature Freeze
>
>   It is worth noting that there are only 2 weeks left until the
> feature freeze time(June 21, 2025), and developers need to pay attention to
> the feature freeze time.
>
>
> - Features:
>
>   So far we've had 16 FLIPs/features, and five FLIPs have been completed in
> 2.1.
>   Regarding the remaining 11 FLIPs, 8 FLIPs are confirmed to be completed
> in 2.1,
>   and the status of the other three is uncertain.
>
>   The progress of two very important FLIP-486 & FLIP-521 is still zero,
>   there are some concerns the effort could be ready for the feature freeze
> of 2.1. I will confirm it with the corresponding contributor.
>
> - CI:
>
>   So far, we have four unstable cases and none blocker issues.
>   all of which have created jira issues：
>1. https://issues.apache.org/jira/browse/FLINK-37703: This was an
> occasional unstable case caused by a test-dependent HDFS, and we found a
> community contributor to locate it
>2. https://issues.apache.org/jira/browse/FLINK-37822: Piotr will fix
> it.
>3. https://issues.apache.org/jira/browse/FLINK-37701: We found Piotr to
> fix it.
>4. https://issues.apache.org/jira/browse/FLINK-37842: This is a problem
> with the test case itself. No contributor has been found yet. We are still
> monitoring it.
>
>In addition, we have recently frequently encountered 502 & 503 Proxy
> Errors in Maven deploy. The response given by ASF infra is hardware problem
> and is in the process of migrating data to a replacement server. For
> details, see: https://issues.apache.org/jira/browse/INFRA-26874
>
> - Benchmark
>   Since May 19, some serde-related performance regressions have occurred,
> see the issue for details. We need a volunteer to locate the root cause by
> reproducing it locally. Since it only occurs on JDK11, Zakelly believes it
> may not be a problem, so it is not a blocker now.
>
>   1. https://issues.apache.org/jira/browse/FLINK-37826
>
> - Sync meeting[3]:
>
>  As code freeze is approaching, release-sync is adjusted from two weeks to
> once a week, so the next meeting time is 06/011/2025, Please feel free to
> join us[2]!
>
>
>  Lastly, we encourage attendees to fill out the topics to be discussed at
> the bottom of 2.1 wiki page[1] a day in advance, to make it easier for
> everyone to understand the background of the topics, thanks!
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/2.1+Release
> [2] https://meet.google.com/pog-pebb-zrj
>

Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API

2025-07-24 Thread Hao Li

Hi Shengkai, Leonard,

Thanks for the feedback.

For Shengkai's feedback:

> 1. Why SecretStoreFactory#open throws a CatalogException? I think the
exteranl system can not handle this exception.
You are right, we can make it throw `Exception`

> 2. I think we can also modify the create catalog ddl syntax.

```
CREATE CATALOG cat USING CONNECTION mycat.mydb.mysql_prod
WITH (
'type' = 'jdbc'
);
```

Does `mycat.mydb.mysql_prod` exist in another catalog? This feels like a
chicken-egg problem. I think for connections to be used for catalog
creation, it needs to be system connection similar to system function which
exist outside of CatalogManager. Maybe we can defer this to later
functionality?

> 3. It seems the connector factory should merge the with options and
connection options together and then create the source/sink. It's
better that framework can merge all these options and connectors don't need
any codes.

I think it's better to separate connections with table options and make it
explicit. Reasons is: the connection here should be a decrypted one. It's
sensitive and should be handled carefully regarding logging, usage etc.
Mixing with original table options makes it hard to do. But the Flip used
`CatalogConnection` which is an encrypted one. I'll update it to
`SensitveConnection`.

> 4. Why we need to introduce ConnectionFactory? I think connection is like
CatalogTable. It should hold the basic information and all information in
the connection should be stored into secret store.

The main reason is to enable user defined connection handling for different
types. For example, if connection type is `basic`, the corresponding
factory can handle basic type secrets (e.g. extract username/password from
connection options and do encryption).

For Leonard's feedback:
> （1） Minor: Interface Hierarchy : Why doesn't WritableSecretStore extend
SecretStore?

Good catch. Let me update it.

>  (2) Configurability of SECRET_FIELDS : Could the hardcoded SECRET_FIELDS
in BasicConnectionFactory be made configurable (e.g., 'token' vs
'accessKey') for better connector compatibility?

This depends on `ConnectionFactory` implementation and can be self defined
by user.

> （3）Inconsistent Return Types : ConnectionFactory#resolveConnection
returns SensitiveConnection, while BasicConnectionFactory#resolveConnection
returns Map. Should these be aligned?

Good catch. Let me update the FLIP.

> （4）Framework-level Resolution : +1 to Shengkai's point about having the
framework (DynamicTableFactory) return complete options to reduce connector
adaptation cost.

Please see my explanation for Shengkai's similar question.

> （5）Secret ID Handling : When no encryption is needed, secretId is null
(from secrets.isEmpty() ? null : secretStore.storeSecret(secrets)). This
behavior should be explicitly documented in the interfaces.

It's an example in the FLIP about how `ConnectionFactory` could be
implemented. How encryption is done depends on actual implementation
though. Let me update the example to make it more clear.

Thanks,
Hao

On Thu, Jul 24, 2025 at 5:04 AM Leonard Xu  wrote:

> Hi friends
>
> I like the updated FLIP goals, that’s what I want. I’ve some feedback:
>
> （1） Minor: Interface Hierarchy : Why doesn't WritableSecretStore extend
> SecretStore?
> （2） Configurability of SECRET_FIELDS : Could the hardcoded SECRET_FIELDS
> in BasicConnectionFactory be made configurable (e.g., 'token' vs
> 'accessKey') for better connector compatibility?
> （3）Inconsistent Return Types : ConnectionFactory#resolveConnection returns
> SensitiveConnection, while BasicConnectionFactory#resolveConnection returns
> Map. Should these be aligned?
> （4）Framework-level Resolution : +1 to Shengkai's point about having the
> framework (DynamicTableFactory) return complete options to reduce connector
> adaptation cost.
> （5）Secret ID Handling : When no encryption is needed, secretId is null
> (from secrets.isEmpty() ? null : secretStore.storeSecret(secrets)). This
> behavior should be explicitly documented in the interfaces.
>
> Best,
> Leonard
>
> > 2025 7月 24 11:44，Shengkai Fang  写道：
> >
> > hi.
> >
> > Sorry for the late reply. I just have some questions:
> >
> > 1. Why SecretStoreFactory#open throws a CatalogException? I think the
> > exteranl system can not handle this exception.
> >
> > 2. I think we can also modify the create catalog ddl syntax.
> >
> > ```
> > CREATE CATALOG cat USING CONNECTION mycat.mydb.mysql_prod
> > WITH (
> >'type' = 'jdbc'
> > );
> > ```
> >
> > 3. It seems the connector factory should merge the with options and
> > connection options together and then create the source/sink. It's
> > better that framework can merge all these options and connectors don't
> need
> > any codes.
> >
> > 4. Why we need to introduce ConnectionFactory? I think connection is like
> > CatalogTable. It should hold the basic information and all information in
> > the connection should be stored into secret store.
> >
> >
> > Best,
> > Shengkai
> >
> >

Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API

2025-07-27 Thread Hao Li

Hi Yanquan,

Thanks for the feedback.

> (1) I agree with the design of SecretStore. I hope to add the results of
the 'SHOW Connections' query to the document, 'secret.id' should be
included in the results, This helps us establish the association between
tables and connections.

I think we can output "secret.id" field in `describe connection` query.
`show connections` can show the names of connections similar to other show
command such as `show tables` or `show models`?

> Can you clearly point out this point. It seems that we cannot create a
CatalogConnection under any catalog. For example, we cannot save
CatalogConnection in the JDBC catalog.

I didn't mean we can't create connection under catalog. I meant we can't
create catalog using connection if connection exists in catalog. We need to
introduce some System Connection later to be used to create catalog. If you
already have a JDBC catalog, we can still save CatalogConnection in it.

Thanks,
Hao

On Sun, Jul 27, 2025 at 7:24 PM Yanquan Lv  wrote:

> Hi Mayank and Hao,
> Thanks for the updates and comments. Here are some of my opinions:
>
> (1) I agree with the design of SecretStore. I hope to add the results of
> the 'SHOW Connections' query to the document, 'secret.id' should be
> included in the results, This helps us establish the association between
> tables and connections.
> (2)
> > I think for connections to be used for catalog
> > creation, it needs to be system connection similar to system function
> which
> > exist outside of CatalogManager.
> Can you clearly point out this point. It seems that we cannot create a
> CatalogConnection under any catalog. For example, we cannot save
> CatalogConnection in the JDBC catalog.
>
> Hao Li  于2025年7月25日周五 06:54写道：
>
> > Hi Shengkai, Leonard,
> >
> > Thanks for the feedback.
> >
> > For Shengkai's feedback:
> >
> > > 1. Why SecretStoreFactory#open throws a CatalogException? I think the
> > exteranl system can not handle this exception.
> > You are right, we can make it throw `Exception`
> >
> > > 2. I think we can also modify the create catalog ddl syntax.
> >
> > ```
> > CREATE CATALOG cat USING CONNECTION mycat.mydb.mysql_prod
> > WITH (
> > 'type' = 'jdbc'
> > );
> > ```
> >
> > Does `mycat.mydb.mysql_prod` exist in another catalog? This feels like a
> > chicken-egg problem. I think for connections to be used for catalog
> > creation, it needs to be system connection similar to system function
> which
> > exist outside of CatalogManager. Maybe we can defer this to later
> > functionality?
> >
> > > 3. It seems the connector factory should merge the with options and
> > connection options together and then create the source/sink. It's
> > better that framework can merge all these options and connectors don't
> need
> > any codes.
> >
> > I think it's better to separate connections with table options and make
> it
> > explicit. Reasons is: the connection here should be a decrypted one. It's
> > sensitive and should be handled carefully regarding logging, usage etc.
> > Mixing with original table options makes it hard to do. But the Flip used
> > `CatalogConnection` which is an encrypted one. I'll update it to
> > `SensitveConnection`.
> >
> > > 4. Why we need to introduce ConnectionFactory? I think connection is
> like
> > CatalogTable. It should hold the basic information and all information in
> > the connection should be stored into secret store.
> >
> > The main reason is to enable user defined connection handling for
> different
> > types. For example, if connection type is `basic`, the corresponding
> > factory can handle basic type secrets (e.g. extract username/password
> from
> > connection options and do encryption).
> >
> > For Leonard's feedback:
> > > （1） Minor: Interface Hierarchy : Why doesn't WritableSecretStore extend
> > SecretStore?
> >
> > Good catch. Let me update it.
> >
> > >  (2) Configurability of SECRET_FIELDS : Could the hardcoded
> SECRET_FIELDS
> > in BasicConnectionFactory be made configurable (e.g., 'token' vs
> > 'accessKey') for better connector compatibility?
> >
> > This depends on `ConnectionFactory` implementation and can be self
> defined
> > by user.
> >
> > > （3）Inconsistent Return Types : ConnectionFactory#resolveConnection
> > returns SensitiveConnection, while
> BasicConnectionFactory#resolveConnection
> > returns Map. Should these be

Re: [VOTE] FLIP-400: AsyncScalarFunction for asynchronous scalar function support

2025-07-25 Thread Hao Li

+1 to change the config name from buffer-capacity to
max-concurrent-operations.

Thanks,
Hao

On Thu, Jul 24, 2025 at 2:26 PM Alan Sheinberg
 wrote:

> Hi,
>
> New functionality for AsyncTableFunction (
> https://github.com/apache/flink/pull/26567) was recently added, which is
> essentially the same async behavior, but for table functions. It contains
> similar configurations as for AsyncScalarFunction.  While reviewing, we
> decided to change the config name of buffer-capacity to be
> max-concurrent-operations, since it's more intuitive. I am going to make a
> similar change to AsyncScalarFunction, including backwards support for the
> existing config name using withDeprecatedKeys.  If there are any issues,
> please let me know.
>
> Thanks,
> Alan
>
> On Wed, Jan 3, 2024 at 11:29 AM Alan Sheinberg 
> wrote:
>
> > Thank you everyone for participating in the vote. I'm closing this vote
> > now and will announce the results in a separate thread.
> >
> > -Alan
> >
> > On Tue, Jan 2, 2024 at 8:40 AM Piotr Nowojski 
> > wrote:
> >
> >> +1 (binding)
> >>
> >> Best,
> >> Piotrek
> >>
> >> czw., 28 gru 2023 o 09:19 Timo Walther  napisał(a):
> >>
> >> > +1 (binding)
> >> >
> >> > Cheers,
> >> > Timo
> >> >
> >> > > Am 28.12.2023 um 03:13 schrieb Yuepeng Pan :
> >> > >
> >> > > +1 (non-binding).
> >> > >
> >> > > Best,
> >> > > Yuepeng Pan.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > At 2023-12-28 09:19:37, "Lincoln Lee" 
> wrote:
> >> > >> +1 (binding)
> >> > >>
> >> > >> Best,
> >> > >> Lincoln Lee
> >> > >>
> >> > >>
> >> > >> Martijn Visser  于2023年12月27日周三 23:16写道：
> >> > >>
> >> > >>> +1 (binding)
> >> > >>>
> >> > >>> On Fri, Dec 22, 2023 at 1:44 AM Jim Hughes
> >> > 
> >> > >>> wrote:
> >> > 
> >> >  Hi Alan,
> >> > 
> >> >  +1 (non binding)
> >> > 
> >> >  Cheers,
> >> > 
> >> >  Jim
> >> > 
> >> >  On Wed, Dec 20, 2023 at 2:41 PM Alan Sheinberg
> >> >   wrote:
> >> > 
> >> > > Hi everyone,
> >> > >
> >> > > I'd like to start a vote on FLIP-400 [1]. It covers introducing
> a
> >> new
> >> > >>> UDF
> >> > > type, AsyncScalarFunction for completing invocations
> >> asynchronously.
> >> > >>> It
> >> > > has been discussed in this thread [2].
> >> > >
> >> > > I would like to start a vote.  The vote will be open for at
> least
> >> 72
> >> > >>> hours
> >> > > (until December 28th 18:00 GMT) unless there is an objection or
> >> > > insufficient votes.
> >> > >
> >> > > [1]
> >> > >
> >> > >
> >> > >>>
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-400%3A+AsyncScalarFunction+for+asynchronous+scalar+function+support
> >> > > [2]
> >> https://lists.apache.org/thread/q3st6t1w05grd7bthzfjtr4r54fv4tm2
> >> > >
> >> > > Thanks,
> >> > > Alan
> >> > >
> >> > >>>
> >> >
> >> >
> >>
> >
>

Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API

2025-07-29 Thread Hao Li

Hi Leonard,

Thanks for the feedback and offline sync yesterday.

> I think connection is a very common need for most catalogs like
MySQLCatalog、KafkaCatalog、HiveCatalog and so on, all of these catalogs need
a connection.
I added `TEMPORARY SYSTEM` connection so it's a global level connection
which can be used for Catalog creation. After syncing with Timo, we propose
to store it first in memory like `TEMPORARY SYSTEM FUNCTION` since this
FLIP is already introducing lots of concepts and interfaces. We can provide
`SYSTEM` connection and related interfaces to persist it in following up
FLIPs.

> In this case, I think reducing connector development cost is more
important than making is explicit, the connector knows which options is
sensitive or not.
Sure. I updated the FLIP to merge connection options with table options so
it's easier for current connectors.

> I hope the BasicConnectionFactory can be a common one that can feed most
common users case, otherwise encrypt all options is a good idea.
I updated to `DefaultConnectionFactory` which handles most of the secret
keys.

Thanks,
Hao

On Mon, Jul 28, 2025 at 6:13 AM Leonard Xu  wrote:

> Hey, Hao
>
> Please see my comments as follows:
>
> >> 2. I think we can also modify the create catalog ddl syntax.
> >
> > ```
> > CREATE CATALOG cat USING CONNECTION mycat.mydb.mysql_prod
> > WITH (
> >'type' = 'jdbc'
> > );
> > ```
> >
> > Does `mycat.mydb.mysql_prod` exist in another catalog? This feels like a
> > chicken-egg problem. I think for connections to be used for catalog
> > creation, it needs to be system connection similar to system function
> which
> > exist outside of CatalogManager. Maybe we can defer this to later
> > functionality?
>
> I think connection is a very common need for most catalogs like
> MySQLCatalog、KafkaCatalog、HiveCatalog and so on, all of these catalogs need
> a connection.
>
>
> >> 3. It seems the connector factory should merge the with options and
> > connection options together and then create the source/sink. It's
> > better that framework can merge all these options and connectors don't
> need
> > any codes.
> >
> > I think it's better to separate connections with table options and make
> it
> > explicit. Reasons is: the connection here should be a decrypted one. It's
> > sensitive and should be handled carefully regarding logging, usage etc.
> > Mixing with original table options makes it hard to do. But the Flip used
> > `CatalogConnection` which is an encrypted one. I'll update it to
> > `SensitveConnection`.
> >> （4）Framework-level Resolution : +1 to Shengkai's point about having the
> > framework (DynamicTableFactory) return complete options to reduce
> connector
> > adaptation cost.
> >
> > Please see my explanation for Shengkai's similar question.
>
> In this case, I think reducing connector development cost is more
> important than making is explicit, the connector knows which options is
> sensitive or not.
>
>
> >> 4. Why we need to introduce ConnectionFactory? I think connection is
> like
> > CatalogTable. It should hold the basic information and all information in
> > the connection should be stored into secret store.
> >
> > The main reason is to enable user defined connection handling for
> different
> > types. For example, if connection type is `basic`, the corresponding
> > factory can handle basic type secrets (e.g. extract username/password
> from
> > connection options and do encryption).
> >
> >> (2) Configurability of SECRET_FIELDS : Could the hardcoded SECRET_FIELDS
> > in BasicConnectionFactory be made configurable (e.g., 'token' vs
> > 'accessKey') for better connector compatibility?
> >
> > This depends on `ConnectionFactory` implementation and can be self
> defined
> > by user.
>
> I hope the BasicConnectionFactory can be a common one that can feed most
> common users case, otherwise encrypt all options is a good idea.
>
> Btw, I also want to push the FLIP forward and start a vote ASAP, thus a
> meeting is welcome if you think it can help finalizing the discussion
> thread.
>
>
> Best,
> Leonard
>
>
>
> >
> >> Hi friends
> >>
> >> I like the updated FLIP goals, that’s what I want. I’ve some feedback:
> >>
> >> （1） Minor: Interface Hierarchy : Why doesn't WritableSecretStore extend
> >> SecretStore?
> >> （2） Configurability of SECRET_FIELDS : Could the hardcoded SECRET_FIELDS
> >> in BasicConnectionFactory be made configurable (e.g., 'token' vs
> >> 'accessKey') for better connector compatibility?
> >> （3）Inconsistent Return Types : ConnectionFactory#resolveConnection
> returns
> >> SensitiveConnection, while BasicConnectionFactory#resolveConnection
> returns
> >> Map. Should these be aligned?
> >> （4）Framework-level Resolution : +1 to Shengkai's point about having the
> >> framework (DynamicTableFactory) return complete options to reduce
> connector
> >> adaptation cost.
> >> （5）Secret ID Handling : When no encryption is needed, secretId is null
> >> (from secrets.isEmpty() ? null : secretStore.storeSe

Re: [DISCUSS] FLIP-540: Support VECTOR_SEARCH in Flink SQL

2025-07-29 Thread Hao Li

Hi Shengkai,

Thanks for the FLIP and enhancement for AI capabilities in Flink. +1.

Thanks,
Hao

On Tue, Jul 29, 2025 at 2:16 AM Shengkai Fang  wrote:

> Hi,
> I'd like to start a discussion of FLIP-540: Support VECTOR_SEARCH in Flink
> SQL[1].
>
> In FLIP-437/FLIP-525, Apache Flink has initially integrated Large Language
> Model (LLM) capabilities, enabling semantic understanding and real-time
> processing of streaming data pipelines. This integration has been
> technically validated in scenarios such as log classification and real-time
> question-answering systems. However, the current architecture allows Flink
> to only use embedding models to convert unstructured data (e.g., text,
> images) into high-dimensional vector features, which are then persisted to
> downstream storage systems (e.g., Milvus, Mongodb). It lacks real-time
> online querying and similarity analysis capabilities for vector spaces. To
> address this limitation, we propose introducing the VECTOR_SEARCH function
> in this FLIP, enabling users to perform streaming vector similarity
> searches and real-time context retrieval (e.g., Retrieval-Augmented
> Generation, RAG) directly within Flink.
>
> Looking forward to comments and suggestions for improvements!
>
> Best,
> Shengkai
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-540%3A+Support+VECTOR_SEARCH+in+Flink+SQL
>

Re: [VOTE] FLIP-539: Support WITH Clause in CREATE FUNCTION Statement in Flink SQL

2025-08-01 Thread Hao Li

+1 (non-binding)

Thanks,
Hao

On Fri, Aug 1, 2025 at 5:33 PM Sergey Nuyanzin  wrote:

> +1 (binding)
>
> On Fri, Aug 1, 2025 at 5:18 PM Ramin Gharib  wrote:
> >
> > Hi David,
> >
> > +1 (non-binding)
> >
> > Thanks for the FLIP and the effort!
> > Best,
> >
> > Ramin
> >
> > On Fri, Aug 1, 2025 at 4:18 PM Dawid Wysakowicz 
> > wrote:
> >
> > > Hi everyone,
> > >
> > > I'd like to start a vote on FLIP-539: Support WITH Clause in CREATE
> > > FUNCTION Statement in Flink SQL [1], which has been discussed in this
> > > thread [2].
> > >
> > > The vote will be open for at least 72 hours unless there is an
> objection
> > > or not enough votes.
> > >
> > > Thanks,
> > > Dawid
> > >
> > > [1] https://cwiki.apache.org/confluence/x/sg9JFg
> > > [2] https://lists.apache.org/thread/2mmc454n2ol2cxdhbh7mz77w62vhstd4
> > >
>
>
>
> --
> Best regards,
> Sergey
>

Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API

2025-07-31 Thread Hao Li

Hi Timo,

Thanks for the suggestion. I updated the FLIP with `ObjectIdentifier`
taking only objectName and made it clear that the system connection uses
only a simple identifier instead of compound identifier.

Thanks,
Hao

On Thu, Jul 31, 2025 at 6:12 AM Timo Walther  wrote:

> Thanks for updating the FLIP and starting a VOTE thread. I have two last
> minute questions before I cast my vote:
>
> It seems the definition of TEMPORARY SYSTEM CONNECTION is not precise
> enough. In my point of view, system connections are single part
> identifiers (similar to functions). And catalog connections are 3-part
> identifiers.
>
> So will we support the following DDL:
>
> CREATE TEMPORARY SYSTEM CONNECTION global_connection WITH (...);
>
> CREATE TABLE t (...) USING CONNECTION global_connection
>
> I think we should. But then we need to update the
> `CatalogTable.getConnection(): Optional` interface,
> because ObjectIdentifier currently only supports 3 part identifiers.
>
> I suggest we modify ObjectIdentifier and let it have either 3 or 1 part.
> Then we can also remove the need for FunctionIdentifier which exists for
> exactly this purpose.
>
> What do you think?
>
> Cheers,
> Timo
>
>
> On 30.07.25 05:36, Leonard Xu wrote:
> > Thanks Hao for the effort to push the FLIP forward, I believe current
> design would satisfy most user cases.
> >
> > +1 to start a vote.
> >
> > Best,
> > Leonard
> >
> >> 2025 7月 30 02:27，Hao Li  写道：
> >>
> >> Sorry, I forgot to mention I also updated the FLIP to include table apis
> >> for connection. It was originally in examples but not in the public api
> >> section.
> >>
> >> On Tue, Jul 29, 2025 at 10:12 AM Hao Li  wrote:
> >>
> >>> Hi Leonard,
> >>>
> >>> Thanks for the feedback and offline sync yesterday.
> >>>
> >>>> I think connection is a very common need for most catalogs like
> >>> MySQLCatalog、KafkaCatalog、HiveCatalog and so on, all of these catalogs
> need
> >>> a connection.
> >>> I added `TEMPORARY SYSTEM` connection so it's a global level connection
> >>> which can be used for Catalog creation. After syncing with Timo, we
> propose
> >>> to store it first in memory like `TEMPORARY SYSTEM FUNCTION` since this
> >>> FLIP is already introducing lots of concepts and interfaces. We can
> provide
> >>> `SYSTEM` connection and related interfaces to persist it in following
> up
> >>> FLIPs.
> >>>
> >>>> In this case, I think reducing connector development cost is more
> >>> important than making is explicit, the connector knows which options is
> >>> sensitive or not.
> >>> Sure. I updated the FLIP to merge connection options with table
> options so
> >>> it's easier for current connectors.
> >>>
> >>>> I hope the BasicConnectionFactory can be a common one that can feed
> most
> >>> common users case, otherwise encrypt all options is a good idea.
> >>> I updated to `DefaultConnectionFactory` which handles most of the
> secret
> >>> keys.
> >>>
> >>> Thanks,
> >>> Hao
> >>>
> >>> On Mon, Jul 28, 2025 at 6:13 AM Leonard Xu  wrote:
> >>>
> >>>> Hey, Hao
> >>>>
> >>>> Please see my comments as follows:
> >>>>
> >>>>>> 2. I think we can also modify the create catalog ddl syntax.
> >>>>>
> >>>>> ```
> >>>>> CREATE CATALOG cat USING CONNECTION mycat.mydb.mysql_prod
> >>>>> WITH (
> >>>>>'type' = 'jdbc'
> >>>>> );
> >>>>> ```
> >>>>>
> >>>>> Does `mycat.mydb.mysql_prod` exist in another catalog? This feels
> like a
> >>>>> chicken-egg problem. I think for connections to be used for catalog
> >>>>> creation, it needs to be system connection similar to system function
> >>>> which
> >>>>> exist outside of CatalogManager. Maybe we can defer this to later
> >>>>> functionality?
> >>>>
> >>>> I think connection is a very common need for most catalogs like
> >>>> MySQLCatalog、KafkaCatalog、HiveCatalog and so on, all of these
> catalogs need
> >>>> a connection.
> >>>>
> >>>>
> >>>>>> 3. It seems the connector factory should merge the with

Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API

2025-07-29 Thread Hao Li

Sorry, I forgot to mention I also updated the FLIP to include table apis
for connection. It was originally in examples but not in the public api
section.

On Tue, Jul 29, 2025 at 10:12 AM Hao Li  wrote:

> Hi Leonard,
>
> Thanks for the feedback and offline sync yesterday.
>
> > I think connection is a very common need for most catalogs like
> MySQLCatalog、KafkaCatalog、HiveCatalog and so on, all of these catalogs need
> a connection.
> I added `TEMPORARY SYSTEM` connection so it's a global level connection
> which can be used for Catalog creation. After syncing with Timo, we propose
> to store it first in memory like `TEMPORARY SYSTEM FUNCTION` since this
> FLIP is already introducing lots of concepts and interfaces. We can provide
> `SYSTEM` connection and related interfaces to persist it in following up
> FLIPs.
>
> > In this case, I think reducing connector development cost is more
> important than making is explicit, the connector knows which options is
> sensitive or not.
> Sure. I updated the FLIP to merge connection options with table options so
> it's easier for current connectors.
>
> > I hope the BasicConnectionFactory can be a common one that can feed most
> common users case, otherwise encrypt all options is a good idea.
> I updated to `DefaultConnectionFactory` which handles most of the secret
> keys.
>
> Thanks,
> Hao
>
> On Mon, Jul 28, 2025 at 6:13 AM Leonard Xu  wrote:
>
>> Hey, Hao
>>
>> Please see my comments as follows:
>>
>> >> 2. I think we can also modify the create catalog ddl syntax.
>> >
>> > ```
>> > CREATE CATALOG cat USING CONNECTION mycat.mydb.mysql_prod
>> > WITH (
>> >'type' = 'jdbc'
>> > );
>> > ```
>> >
>> > Does `mycat.mydb.mysql_prod` exist in another catalog? This feels like a
>> > chicken-egg problem. I think for connections to be used for catalog
>> > creation, it needs to be system connection similar to system function
>> which
>> > exist outside of CatalogManager. Maybe we can defer this to later
>> > functionality?
>>
>> I think connection is a very common need for most catalogs like
>> MySQLCatalog、KafkaCatalog、HiveCatalog and so on, all of these catalogs need
>> a connection.
>>
>>
>> >> 3. It seems the connector factory should merge the with options and
>> > connection options together and then create the source/sink. It's
>> > better that framework can merge all these options and connectors don't
>> need
>> > any codes.
>> >
>> > I think it's better to separate connections with table options and make
>> it
>> > explicit. Reasons is: the connection here should be a decrypted one.
>> It's
>> > sensitive and should be handled carefully regarding logging, usage etc.
>> > Mixing with original table options makes it hard to do. But the Flip
>> used
>> > `CatalogConnection` which is an encrypted one. I'll update it to
>> > `SensitveConnection`.
>> >> （4）Framework-level Resolution : +1 to Shengkai's point about having the
>> > framework (DynamicTableFactory) return complete options to reduce
>> connector
>> > adaptation cost.
>> >
>> > Please see my explanation for Shengkai's similar question.
>>
>> In this case, I think reducing connector development cost is more
>> important than making is explicit, the connector knows which options is
>> sensitive or not.
>>
>>
>> >> 4. Why we need to introduce ConnectionFactory? I think connection is
>> like
>> > CatalogTable. It should hold the basic information and all information
>> in
>> > the connection should be stored into secret store.
>> >
>> > The main reason is to enable user defined connection handling for
>> different
>> > types. For example, if connection type is `basic`, the corresponding
>> > factory can handle basic type secrets (e.g. extract username/password
>> from
>> > connection options and do encryption).
>> >
>> >> (2) Configurability of SECRET_FIELDS : Could the hardcoded
>> SECRET_FIELDS
>> > in BasicConnectionFactory be made configurable (e.g., 'token' vs
>> > 'accessKey') for better connector compatibility?
>> >
>> > This depends on `ConnectionFactory` implementation and can be self
>> defined
>> > by user.
>>
>> I hope the BasicConnectionFactory can be a common one that can feed most
>> common users case, otherwis

Re: [DISCUSS] FLIP-539: Support WITH Clause in CREATE FUNCTION Statement in Flink SQL

2025-07-29 Thread Hao Li

Hi Dawid,

Thanks for the FLIP. +1 to support options for functions as well.

I have one question:
Do you want to update `createTemporarySystemFunction` in `TableEnvironment`
to support `FunctionDescriptor` as well?

Thanks,
Hao

On Tue, Jul 29, 2025 at 10:33 AM Yash Anand 
wrote:

> Hi Dawid,
>
> Thank you for initiating this FLIP, looks like a useful addition +1 for the
> FLIP.
>
> I have just one question, will the option keys be limited to some
> predefined values or it could be any key user wants to add as metadata?
> Like if the user wants to add a description for each argument?
>
> On Tue, Jul 29, 2025 at 7:09 AM Timo Walther  wrote:
>
> > Maybe not in the first version but eventually nothing in the design
> > blocks us for supporting this. The SecretStore would need to be
> > available in the FunctionDefinitionFactory for this.
> >
> > Cheers,
> > Timo
> >
> > On 29.07.25 15:08, Ryan van Huuksloot wrote:
> > > Overall the FLIP looks good to me.
> > >
> > > Would the properties support the SecretStore proposed in FLIP-529?
> > >
> > > Otherwise, +1, thanks!
> > >
> > > Ryan van Huuksloot
> > > Staff Engineer, Infrastructure | Streaming Platform
> > > [image: Shopify]
> > > <
> https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email
> > >
> > >
> > >
> > > On Tue, Jul 29, 2025 at 4:19 AM Jacky Lau 
> wrote:
> > >
> > >> Thanks for initiating this!
> > >>
> > >> +1 for this proposal.
> > >>
> > >> Sergey Nuyanzin  于2025年7月29日周二 15:34写道：
> > >>
> > >>> Thanks for driving this Dawid.
> > >>>
> > >>> looks reasonable to me
> > >>>
> > >>> On Mon, Jul 28, 2025 at 5:03 PM Ramin Gharib 
> > >>> wrote:
> > 
> >  Hello Dawid,
> > 
> >  Thanks for initiating this! The FLIP looks well-written.
> >  The WITH clause brings consistency to existing syntax.
> > 
> >  +1 for this proposal.
> > 
> >  On Mon, Jul 28, 2025 at 3:33 PM Dawid Wysakowicz <
> > >> dwysakow...@apache.org
> > 
> >  wrote:
> > 
> > > Hi,
> > > I'd like to start a discussion of FLIP-539: Support WITH Clause in
> > >>> CREATE
> > > FUNCTION Statement in Flink SQL [1].
> > >
> > > The existing CREATE FUNCTION  statement in Flink SQL allows users
> to
> > > register user-defined functions (UDFs) by specifying the class name
> > >>> and the
> > > artifact (JAR) containing the implementation. While this design
> > >> covers
> > > common use cases, it lacks a declarative mechanism for associating
> > > arbitrary properties or metadata with the function at creation
> time.
> > >
> > > Other Flink SQL objects—such as tables—support a WITH  clause for
> > > specifying options in a key-value fashion, improving consistency,
> > > discoverability, and extensibility.
> > > Looking forward to comments and suggestions for improvements!
> > >
> > > Best,
> > > Dawid
> > >
> > > [1] https://cwiki.apache.org/confluence/x/sg9JFg
> > >
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Best regards,
> > >>> Sergey
> > >>>
> > >>
> > >
> >
> >
>

[VOTE] FLIP-529: Connections in Flink SQL and TableAPI

2025-07-30 Thread Hao Li

Hi everyone,

I'd like to start a vote on FLIP-529: Connections in Flink SQL and TableAPI
[1], which has been discussed in this thread [2].

The vote will be open for at least 72 hours unless there is an objection
or not enough votes.

Thanks,
Hao

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-529%3A+Connections+in+Flink+SQL+and+TableAPI
[2] https://lists.apache.org/thread/np8qx40tmhby7tv3dy71pd75nrc094rc

[jira] [Created] (FLINK-34992) FLIP-437: Support ML Models in Flink SQL

2024-04-02 Thread Hao Li (Jira)

Hao Li created FLINK-34992:
--

 Summary: FLIP-437: Support ML Models in Flink SQL
 Key: FLINK-34992
 URL: https://issues.apache.org/jira/browse/FLINK-34992
 Project: Flink
  Issue Type: New Feature
Reporter: Hao Li


This is an umbrella task for FLIP-437. FLIP-437: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-34993) Support Model CRUD in parser

2024-04-02 Thread Hao Li (Jira)

Hao Li created FLINK-34993:
--

 Summary: Support Model CRUD in parser
 Key: FLINK-34993
 URL: https://issues.apache.org/jira/browse/FLINK-34993
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35013) Support temporary model

2024-04-04 Thread Hao Li (Jira)

Hao Li created FLINK-35013:
--

 Summary: Support temporary model
 Key: FLINK-35013
 URL: https://issues.apache.org/jira/browse/FLINK-35013
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35014) SqlNode to operation conversion for models

2024-04-04 Thread Hao Li (Jira)

Hao Li created FLINK-35014:
--

 Summary: SqlNode to operation conversion for models
 Key: FLINK-35014
 URL: https://issues.apache.org/jira/browse/FLINK-35014
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35016) Catalog changes for model CRUD

2024-04-04 Thread Hao Li (Jira)

Hao Li created FLINK-35016:
--

 Summary: Catalog changes for model CRUD
 Key: FLINK-35016
 URL: https://issues.apache.org/jira/browse/FLINK-35016
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35018) ML_EVALUATE function

2024-04-04 Thread Hao Li (Jira)

Hao Li created FLINK-35018:
--

 Summary: ML_EVALUATE function
 Key: FLINK-35018
 URL: https://issues.apache.org/jira/browse/FLINK-35018
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35017) ML_PREDICT function

2024-04-04 Thread Hao Li (Jira)

Hao Li created FLINK-35017:
--

 Summary: ML_PREDICT function
 Key: FLINK-35017
 URL: https://issues.apache.org/jira/browse/FLINK-35017
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35019) Support show create model syntax

2024-04-04 Thread Hao Li (Jira)

Hao Li created FLINK-35019:
--

 Summary: Support show create model syntax
 Key: FLINK-35019
 URL: https://issues.apache.org/jira/browse/FLINK-35019
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li


show options in addition to input/output schema



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-35020) Model Catalog implementation in Hive etc

2024-04-04 Thread Hao Li (Jira)

Hao Li created FLINK-35020:
--

 Summary: Model Catalog implementation in Hive etc
 Key: FLINK-35020
 URL: https://issues.apache.org/jira/browse/FLINK-35020
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37780) ml_predict sql function skeleton

2025-05-12 Thread Hao Li (Jira)

Hao Li created FLINK-37780:
--

 Summary: ml_predict sql function skeleton
 Key: FLINK-37780
 URL: https://issues.apache.org/jira/browse/FLINK-37780
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37778) model keyword syntax change

2025-05-12 Thread Hao Li (Jira)

Hao Li created FLINK-37778:
--

 Summary: model keyword syntax change
 Key: FLINK-37778
 URL: https://issues.apache.org/jira/browse/FLINK-37778
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37779) Add getModel to Flink catalog reader etc

2025-05-12 Thread Hao Li (Jira)

Hao Li created FLINK-37779:
--

 Summary: Add getModel to Flink catalog reader etc
 Key: FLINK-37779
 URL: https://issues.apache.org/jira/browse/FLINK-37779
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37777) FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

2025-05-12 Thread Hao Li (Jira)

Hao Li created FLINK-3:
--

 Summary: FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation 
Design
 Key: FLINK-3
 URL: https://issues.apache.org/jira/browse/FLINK-3
 Project: Flink
  Issue Type: New Feature
  Components: Table SQL / API, Table SQL / Planner, Table SQL / Runtime
Reporter: Hao Li


Implement ml_predict and ml_evaluate functions in 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-525%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Implementation+Design



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37799) Flink native model predict runtime implementation

2025-05-13 Thread Hao Li (Jira)

Hao Li created FLINK-37799:
--

 Summary: Flink native model predict runtime implementation
 Key: FLINK-37799
 URL: https://issues.apache.org/jira/browse/FLINK-37799
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37797) Documentation

2025-05-13 Thread Hao Li (Jira)

Hao Li created FLINK-37797:
--

 Summary: Documentation
 Key: FLINK-37797
 URL: https://issues.apache.org/jira/browse/FLINK-37797
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37789) Integrate with sqlvalidator

2025-05-13 Thread Hao Li (Jira)

Hao Li created FLINK-37789:
--

 Summary: Integrate with sqlvalidator
 Key: FLINK-37789
 URL: https://issues.apache.org/jira/browse/FLINK-37789
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37791) Integrate with sql to rel converter

2025-05-13 Thread Hao Li (Jira)

Hao Li created FLINK-37791:
--

 Summary: Integrate with sql to rel converter
 Key: FLINK-37791
 URL: https://issues.apache.org/jira/browse/FLINK-37791
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37790) Add model factory related interfaces/classes

2025-05-13 Thread Hao Li (Jira)

Hao Li created FLINK-37790:
--

 Summary: Add model factory related interfaces/classes
 Key: FLINK-37790
 URL: https://issues.apache.org/jira/browse/FLINK-37790
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37792) Physical rewrite for ml_predict

2025-05-13 Thread Hao Li (Jira)

Hao Li created FLINK-37792:
--

 Summary: Physical rewrite for ml_predict
 Key: FLINK-37792
 URL: https://issues.apache.org/jira/browse/FLINK-37792
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37794) ml_evaluate sql function skeleton

2025-05-13 Thread Hao Li (Jira)

Hao Li created FLINK-37794:
--

 Summary: ml_evaluate sql function skeleton
 Key: FLINK-37794
 URL: https://issues.apache.org/jira/browse/FLINK-37794
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37795) Physical rewrite for ml_evaluate

2025-05-13 Thread Hao Li (Jira)

Hao Li created FLINK-37795:
--

 Summary: Physical rewrite for ml_evaluate
 Key: FLINK-37795
 URL: https://issues.apache.org/jira/browse/FLINK-37795
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37793) Codegen for ml_predict

2025-05-13 Thread Hao Li (Jira)

Hao Li created FLINK-37793:
--

 Summary: Codegen for ml_predict
 Key: FLINK-37793
 URL: https://issues.apache.org/jira/browse/FLINK-37793
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37796) Codegen for ml_evaluate

2025-05-13 Thread Hao Li (Jira)

Hao Li created FLINK-37796:
--

 Summary: Codegen for ml_evaluate
 Key: FLINK-37796
 URL: https://issues.apache.org/jira/browse/FLINK-37796
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37819) Add column expansion test for model TVF

2025-05-20 Thread Hao Li (Jira)

Hao Li created FLINK-37819:
--

 Summary: Add column expansion test for model TVF
 Key: FLINK-37819
 URL: https://issues.apache.org/jira/browse/FLINK-37819
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li


https://github.com/apache/flink/pull/26553/files#r2097261694



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37849) Model provider factory discover

2025-05-25 Thread Hao Li (Jira)

Hao Li created FLINK-37849:
--

 Summary: Model provider factory discover
 Key: FLINK-37849
 URL: https://issues.apache.org/jira/browse/FLINK-37849
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37978) Change max-buffer config to max-concurrent-operations

2025-06-20 Thread Hao Li (Jira)

Hao Li created FLINK-37978:
--

 Summary: Change max-buffer config to max-concurrent-operations
 Key: FLINK-37978
 URL: https://issues.apache.org/jira/browse/FLINK-37978
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li


Change max-buffer config to max-concurrent-operations



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37929) Support cdc stream for ml_predict

2025-06-09 Thread Hao Li (Jira)

Hao Li created FLINK-37929:
--

 Summary: Support cdc stream for ml_predict
 Key: FLINK-37929
 URL: https://issues.apache.org/jira/browse/FLINK-37929
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li


Support cdc stream for ml_predict if user can indicate model is deterministic 
through config



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37928) Support only insert only stream for ml_predict

2025-06-09 Thread Hao Li (Jira)

Hao Li created FLINK-37928:
--

 Summary: Support only insert only stream for ml_predict
 Key: FLINK-37928
 URL: https://issues.apache.org/jira/browse/FLINK-37928
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li


Since ml_predict isn't deterministic by nature



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37939) Metrics for ml predict function

2025-06-10 Thread Hao Li (Jira)

Hao Li created FLINK-37939:
--

 Summary: Metrics for ml predict function
 Key: FLINK-37939
 URL: https://issues.apache.org/jira/browse/FLINK-37939
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li


request counter

latency gauge or histogram

success/failure counter



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-37938) ml evaluate implementation for different task types

2025-06-10 Thread Hao Li (Jira)

Hao Li created FLINK-37938:
--

 Summary: ml evaluate implementation for different task types
 Key: FLINK-37938
 URL: https://issues.apache.org/jira/browse/FLINK-37938
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li


ml evaluate implementation for different task types



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-38067) Release Testing: Verify FLIP-525: Model ML_PREDICT, ML_EVALUATE Implementation Design

2025-07-08 Thread Hao Li (Jira)

Hao Li created FLINK-38067:
--

 Summary: Release Testing: Verify FLIP-525: Model ML_PREDICT, 
ML_EVALUATE Implementation Design
 Key: FLINK-38067
 URL: https://issues.apache.org/jira/browse/FLINK-38067
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-38068) Fix distribution

2025-07-08 Thread Hao Li (Jira)

Hao Li created FLINK-38068:
--

 Summary: Fix distribution
 Key: FLINK-38068
 URL: https://issues.apache.org/jira/browse/FLINK-38068
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-38066) Release Testing: Verify FLIP-437: Support ML Models in Flink SQL

2025-07-08 Thread Hao Li (Jira)

Hao Li created FLINK-38066:
--

 Summary: Release Testing: Verify FLIP-437: Support ML Models in 
Flink SQL
 Key: FLINK-38066
 URL: https://issues.apache.org/jira/browse/FLINK-38066
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-38084) Update model provider download doc

2025-07-10 Thread Hao Li (Jira)

Hao Li created FLINK-38084:
--

 Summary: Update model provider download doc
 Key: FLINK-38084
 URL: https://issues.apache.org/jira/browse/FLINK-38084
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li


See [https://github.com/apache/flink/pull/26770#discussion_r2194229075.] Add 
download doc once openai provider jar is published in 2.1 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-38104) FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API

2025-07-14 Thread Hao Li (Jira)

Hao Li created FLINK-38104:
--

 Summary: FLIP-526: Model ML_PREDICT, ML_EVALUATE Table API
 Key: FLINK-38104
 URL: https://issues.apache.org/jira/browse/FLINK-38104
 Project: Flink
  Issue Type: New Feature
Reporter: Hao Li


Implement 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-526%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Table+API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-38020) NonTimeRangeUnboundedPrecedingFunction failing with NPE

2025-06-26 Thread Hao Li (Jira)

Hao Li created FLINK-38020:
--

 Summary: NonTimeRangeUnboundedPrecedingFunction failing with NPE
 Key: FLINK-38020
 URL: https://issues.apache.org/jira/browse/FLINK-38020
 Project: Flink
  Issue Type: Bug
Reporter: Hao Li


[https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/main/java/org/apache/flink/table/runtime/operators/over/NonTimeRangeUnboundedPrecedingFunction.java#L425-L432]
 missing calling `setAccumulator`.

 

[https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/test/java/org/apache/flink/table/runtime/operators/over/NonTimeRangeUnboundedPrecedingFunctionTest.java#L55]
 didn't catch it because `sum` in 
[https://github.com/apache/flink/blob/master/flink-table/flink-table-runtime/src/test/java/org/apache/flink/table/runtime/operators/over/SumAggsHandleFunction.java#L30]
 is long. It will fail with NPE if change it `Long`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-38117) Add anonymous ContextResolvedModel

2025-07-17 Thread Hao Li (Jira)

Hao Li created FLINK-38117:
--

 Summary: Add anonymous ContextResolvedModel
 Key: FLINK-38117
 URL: https://issues.apache.org/jira/browse/FLINK-38117
 Project: Flink
  Issue Type: Sub-task
Reporter: Hao Li






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

96 matches

Mail list logo