Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

Timo Walther Thu, 28 Mar 2024 09:32:04 -0700

Hi everyone,

I updated the FLIP according to this discussion.

@Hao Li: Let me know if I made a mistake somewhere. I added someadditional explaning comments about the new PTF syntax.

There are no further objections from my side. If nobody objects, Haofeel free to start the voting tomorrow.


Regards,
Timo


On 28.03.24 16:30, Jark Wu wrote:

Thanks, Hao,

Sounds good to me.

Best,
Jark

On Thu, 28 Mar 2024 at 01:02, Hao Li <[email protected]> wrote:

Hi Jark,

I think we can start with supporting popular model providers such as
openai, azureml, sagemaker for remote models.

Thanks,
Hao

On Tue, Mar 26, 2024 at 8:15 PM Jark Wu <[email protected]> wrote:

Thanks for the PoC and updating,

The final syntax looks good to me, at least it is a nice and concise

first

step.

SELECT f1, f2, label FROM
    ML_PREDICT(
      input => `my_data`,
      model => `my_cat`.`my_db`.`classifier_model`,
      args => DESCRIPTOR(f1, f2));

Besides, what built-in models will we support in the FLIP? This might be
important
because it relates to what use cases can run with the new Flink version

out

of the box.

Best,
Jark

On Wed, 27 Mar 2024 at 01:10, Hao Li <[email protected]> wrote:

Hi Timo,

Yeah. For `primary key` and `from table(...)` those are explicitly

matched

in parser: [1].

SELECT f1, f2, label FROM

    ML_PREDICT(
      input => `my_data`,
      model => `my_cat`.`my_db`.`classifier_model`,
      args => DESCRIPTOR(f1, f2));

This named argument syntax looks good to me. It can be supported

together

with

SELECT f1, f2, label FROM ML_PREDICT(`my_data`,
`my_cat`.`my_db`.`classifier_model`,DESCRIPTOR(f1, f2));

Sure. Will let you know once updated the FLIP.

[1]

https://github.com/confluentinc/flink/blob/release-1.18-confluent/flink-table/flink-sql-parser/src/main/codegen/includes/parserImpls.ftl#L814


Thanks,
Hao

On Tue, Mar 26, 2024 at 4:15 AM Timo Walther <[email protected]>

wrote:

Hi Hao,

  > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)`

doesn't

  > work since `TABLE` and `MODEL` are already key words

This argument doesn't count. The parser supports introducing keywords
that are still non-reserved. For example, this enables using "key"

for

both primary key and a column name:

CREATE TABLE t (i INT PRIMARY KEY NOT ENFORCED)
WITH ('connector' = 'datagen');

SELECT i AS key FROM t;

I'm sure we will introduce `TABLE(my_data)` eventually as this is

what

the standard dictates. But for now, let's use the most compact syntax
possible which is also in sync with Oracle.

TLDR: We allow identifiers as arguments for PTFs which are expanded

with

catalog and database if necessary. Those identifier arguments

translate

to catalog lookups for table and models. The ML_ functions will make
sure that the arguments are of correct type model or table.

SELECT f1, f2, label FROM
    ML_PREDICT(
      input => `my_data`,
      model => `my_cat`.`my_db`.`classifier_model`,
      args => DESCRIPTOR(f1, f2));

So this will allow us to also use in the future:

SELECT * FROM poly_func(table1);

Same support as Oracle [1]. Very concise.

Let me know when you updated the FLIP for a final review before

voting.


Do others have additional objections?

Regards,
Timo

[1]

https://livesql.oracle.com/apex/livesql/file/content_HQK7TYEO0NHSJCDY3LN2ERDV6.html




On 25.03.24 23:40, Hao Li wrote:

Hi Timo,

Please double check if this is implementable with the current

stack. I

fear the parser or validator might not like the "identifier"

argument?


I checked this, currently the validator throws an exception trying

to

get

the full qualifier name for `classifier_model`. But since
`SqlValidatorImpl` is implemented in Flink, we should be able to

fix

this.

The only caveator is if not full model path is provided,
the qualifier is interpreted as a column. We should be able to

special

handle this by rewriting the `ml_predict` function to add the

catalog

and

database name in `FlinkCalciteSqlValidator` though.

SELECT f1, f2, label FROM

     ML_PREDICT(
       TABLE `my_data`,
       my_cat.my_db.classifier_model,
       DESCRIPTOR(f1, f2))

SELECT f1, f2, label FROM
     ML_PREDICT(
       input => TABLE `my_data`,
       model => my_cat.my_db.classifier_model,
       args => DESCRIPTOR(f1, f2))

I verified these can be parsed. The problem is in validator for

qualifier

as mentioned above.

So the safest option would be the long-term solution:


SELECT f1, f2, label FROM
     ML_PREDICT(
       input => TABLE(my_data),
       model => MODEL(my_cat.my_db.classifier_model),
       args => DESCRIPTOR(f1, f2))

`TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't

work

since `TABLE` and `MODEL` are already key words in calcite used by

`CREATE

TABLE`, `CREATE MODEL`. Changing to `model_name(...)` works and

will

be

treated as a function.

So I think

SELECT f1, f2, label FROM
     ML_PREDICT(
       input => TABLE `my_data`,
       model => my_cat.my_db.classifier_model,
       args => DESCRIPTOR(f1, f2))
should be fine for now.

For the syntax part:
1). Sounds good. We can drop model task and model kind from the

definition.

They can be deduced from the options.

2). Sure. We can add temporary model

3). Make sense. We can use `show create model <name>` to display

all

information and `describe model <name>` to show input/output schema

Thanks,
Hao

On Mon, Mar 25, 2024 at 3:21 PM Hao Li <[email protected]> wrote:

Hi Ahmed,

Looks like the feature freeze time for 1.20 release is June 15th.

We

can

definitely get the model DDL into 1.20. For predict and evaluate

functions,

if we can't get into the 1.20 release, we can get them into the

1.21

release for sure.

Thanks,
Hao



On Mon, Mar 25, 2024 at 1:25 AM Timo Walther <[email protected]>

wrote:

Hi Jark and Hao,

thanks for the information, Jark! Great that the Calcite

community

already fixed the problem for us. +1 to adopt the simplified

syntax

asap. Maybe even before we upgrade Calcite (i.e. copy over

classes),

if

upgrading Calcite is too much work right now?

   > Is `DESCRIPTOR` a must in the syntax?

Yes, we should still stick to the standard as much as possible

and

all

vendors use DESCRIPTOR/COLUMNS for distinuishing columns vs.

literal

arguments. So the final syntax of this discussion would be:


SELECT f1, f2, label FROM
     ML_PREDICT(TABLE `my_data`, `classifier_model`,

DESCRIPTOR(f1,

f2))


SELECT * FROM
     ML_EVALUATE(TABLE `eval_data`, `classifier_model`,

DESCRIPTOR(f1,

f2))


Please double check if this is implementable with the current

stack.

fear the parser or validator might not like the "identifier"

argument?


Make sure that also these variations are supported:

SELECT f1, f2, label FROM
     ML_PREDICT(
       TABLE `my_data`,
       my_cat.my_db.classifier_model,
       DESCRIPTOR(f1, f2))

SELECT f1, f2, label FROM
     ML_PREDICT(
       input => TABLE `my_data`,
       model => my_cat.my_db.classifier_model,
       args => DESCRIPTOR(f1, f2))

It might be safer and more future proof to wrap a MODEL()

function

around it. This would be more in sync with the standard that

actually

still requires to put a TABLE() around the input argument:

ML_PREDICT(TABLE(`my_data`) PARTITIONED BY c1 ORDERED BY c1,

....)


So the safest option would be the long-term solution:

SELECT f1, f2, label FROM
     ML_PREDICT(
       input => TABLE(my_data),
       model => MODEL(my_cat.my_db.classifier_model),
       args => DESCRIPTOR(f1, f2))

But I'm fine with this if others have a strong opinion:

SELECT f1, f2, label FROM
     ML_PREDICT(
       input => TABLE `my_data`,
       model => my_cat.my_db.classifier_model,
       args => DESCRIPTOR(f1, f2))

Some feedback for the remainder of the FLIP:

1) Simplify catalog objects

I would suggest to drop:
CatalogModel.getModelKind()
CatalogModel.getModelTask()

A catalog object should fully resemble the DDL. And since the DDL

puts

those properties in the WITH clause, the catalog object should

the

same

(i.e. put them into the `getModelOptions()`). Btw renaming this

method

to just `getOptions()` for consistency should be good as well.
Internally, we can still provide enums for these frequently used
classes. Similar to what we do in `FactoryUtil` for other

frequently

used options.

Remove `getDescription()` and `getDetailedDescription()`. They

were a

mistake for CatalogTable and should actually be deprecated. They

got

replaced by `getComment()` which is sufficient.

2) CREATE TEMPORARY MODEL is not supported.

This is an unnecessary restriction. We should support temporary

versions

of these catalog objects as well for consistency. Adding support

for

this should be straightforward.

3) DESCRIBE | DESC } MODEL

[catalog_name.][database_name.]model_name


I would suggest we support `SHOW CREATE MODEL` instead. Similar

to

`SHOW

CREATE TABLE`, this should show all properties. If we support

`DESCRIBE

MODEL` it should only list the input parameters similar to

`DESCRIBE

TABLE` only shows the columns (not the WITH clause).

Regards,
Timo


On 23.03.24 13:17, Ahmed Hamdy wrote:

Hi everyone,
+1 for this proposal, I believe it is very useful to the

minimum,

It

would

be great even having  "ML_PREDICT" and "ML_EVALUATE" as built-in

PTFs

in

this FLIP as discussed.
IIUC this will be included in the 1.20 roadmap?
Best Regards
Ahmed Hamdy


On Fri, 22 Mar 2024 at 23:54, Hao Li <[email protected]>

wrote:

Hi Timo and Jark,

I agree Oracle's syntax seems concise and more descriptive. For

the

built-in `ML_PREDICT` and `ML_EVALUATE` functions I agree with

Jark

we

can

support them as built-in PTF using `SqlTableFunction` for this

FLIP.

We can

have a different FLIP discussing user defined PTF and adopt

that

later

for

model functions later. To summarize, the current proposed

syntax

is


SELECT f1, f2, label FROM TABLE(ML_PREDICT(TABLE `my_data`,
`classifier_model`, f1, f2))

SELECT * FROM TABLE(ML_EVALUATE(TABLE `eval_data`,

`classifier_model`,

f1,

f2))

Is `DESCRIPTOR` a must in the syntax? If so, it becomes

SELECT f1, f2, label FROM TABLE(ML_PREDICT(TABLE `my_data`,
`classifier_model`, DESCRIPTOR(f1), DESCRIPTOR(f2)))

SELECT * FROM TABLE(ML_EVALUATE(TABLE `eval_data`,

`classifier_model`,

DESCRIPTOR(f1), DESCRIPTOR(f2)))

If Calcite supports dropping outer table keyword, it becomes

SELECT f1, f2, label FROM ML_PREDICT(TABLE `my_data`,

`classifier_model`,

DESCRIPTOR(f1), DESCRIPTOR(f2))

SELECT * FROM ML_EVALUATE(TABLE `eval_data`,

`classifier_model`,

DESCRIPTOR(
f1), DESCRIPTOR(f2))

Thanks,
Hao



On Fri, Mar 22, 2024 at 9:16 AM Jark Wu <[email protected]>

wrote:

Sorry, I mean we can bump the Calcite version if needed in

Flink

1.20.


On Fri, 22 Mar 2024 at 22:19, Jark Wu <[email protected]>

wrote:

Hi Timo,

Introducing user-defined PTF is very useful in Flink, I'm +1

for

this.

But I think the ML model FLIP is not blocked by this, because

we

can introduce ML_PREDICT and ML_EVALUATE as built-in PTFs
just like TUMBLE/HOP. And support user-defined ML functions

as

a future FLIP.

Regarding the simplified PTF syntax which reduces the outer

TABLE()

keyword,
it seems it was just supported[1] by the Calcite community

last

month

and

will be
released in the next version (v1.37). The Calcite community

is

preparing

the
1.37 release, so we can bump the version if needed in Flink

1.19.


Best,
Jark

[1]: https://issues.apache.org/jira/browse/CALCITE-6254

On Fri, 22 Mar 2024 at 21:46, Timo Walther <

[email protected]

wrote:

Hi everyone,

this is a very important change to the Flink SQL syntax but

we

can't

wait until the SQL standard is ready for this. So I'm +1 on

introducing

the MODEL concept as a first class citizen in Flink.

For your information: Over the past months I have already

spent

significant amount of time thinking about how we can

introduce

PTFs

in

Flink. I reserved FLIP-440[1] for this purpose and I will

share

version of this in the next 1-2 weeks.

For a good implementation of FLIP-440 and also FLIP-437, we

should

evolve the PTF syntax in collaboration with Apache Calcite.

There are different syntax versions out there:

1) Flink

SELECT * FROM
      TABLE(TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL

'10'

MINUTES));


2) SQL standard

SELECT * FROM
      TABLE(TUMBLE(TABLE(Bid), DESCRIPTOR(bidtime), INTERVAL

'10'

MINUTES));


3) Oracle

SELECT * FROM
       TUMBLE(Bid, COLUMNS(bidtime), INTERVAL '10' MINUTES));

As you can see above, Flink does not follow the standard

correctly

as

it

would need to use `TABLE()` but this is not provided by

Calcite

yet.


I really like the Oracle syntax[2][3] a lot. It reduces

necessary

keywords to a minimum. Personally, I would like to discuss

this

syntax

in a separate FLIP and hope I will find supporters for:


SELECT * FROM
      TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10'

MINUTES);


If we go entirely with the Oracle syntax, as you can see in

the

example,

Oracle allows for passing identifiers directly. This would

solve

our

problems for the MODEL as well:

SELECT f1, f2, label FROM ML_PREDICT(
      data => `my_data`,
      model => `classifier_model`,
      input => DESCRIPTOR(f1, f2));

Or we completely adopt the Oracle syntax:

SELECT f1, f2, label FROM ML_PREDICT(
      data => `my_data`,
      model => `classifier_model`,
      input => COLUMNS(f1, f2));


What do you think?

Happy to create a FLIP for just this syntax question and

collaborate

with the Calcite community on this. Supporting the syntax of

Oracle

shouldn't be too hard to convince at least as parser

parameter.


Regards,
Timo

[1]

https://cwiki.apache.org/confluence/display/FLINK/%5BWIP%5D+FLIP-440%3A+User-defined+Polymorphic+Table+Functions

[2]

https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_TF.html#GUID-0F66E239-DE77-4C0E-AC76-D5B632AB8072

[3]

https://oracle-base.com/articles/18c/polymorphic-table-functions-18c




On 20.03.24 17:22, Mingge Deng wrote:

Thanks Jark for all the insightful comments.

We have updated the proposal per our offline discussions:
1. Model will be treated as a new relation in FlinkSQL.
2. Include the common ML predict and evaluate functions

into

the

open

source flink to complete the user journey.
        And we should be able to extend the calcite

SqlTableFunction

to

support

these two ML functions.

Best,
Mingge

On Mon, Mar 18, 2024 at 7:05 PM Jark Wu <[email protected]>

wrote:

Hi Hao,

I meant how the table name

in window TVF gets translated to `SqlCallingBinding`.

Probably

we

need

to

fetch the table definition from the catalog somewhere. Do

we

treat

those

window TVF specially in parser/planner so that catalog is

looked

up

when

they are seen?

The table names are resolved and validated by Calcite

SqlValidator.

We

don' need to fetch from catalog manually.
The specific checking logic of cumulate window happens in

SqlCumulateTableFunction.OperandMetadataImpl#checkOperandTypes.

The return type of SqlCumulateTableFunction is defined in
#getRowTypeInference() method.
Both are public interfaces provided by Calcite and it

seems

it's

not

specially handled in parser/planner.

I didn't try that, but my gut feeling is that the

framework

is

ready

to

extend a customized TVF.

For what model is, I'm wondering if it has to be datatype

or

relation.

Can
it be another kind of citizen parallel to

datatype/relation/function/db?

Redshift also supports `show models` operation, so it

seems

it's

treated

specially as well?

If it is an entity only used in catalog scope (e.g., show

xxx,

create

xxx,

drop xxx), it is fine to introduce it.
We have introduced such one before, called Module: "load

module",

"show

modules" [1].
But if we want to use Model in TVF parameters, it means it

has

to

be

relation or datatype, because
that is what it only accepts now.

Thanks for sharing the reason of preferring TVF instead of

Redshift

way. It

sounds reasonable to me.

Best,
Jark

     [1]:

https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/modules/


On Fri, 15 Mar 2024 at 13:41, Hao Li

<[email protected]

wrote:

Hi Jark,

Thanks for the pointer. Sorry for the confusion: I meant

how

the

table

name

in window TVF gets translated to `SqlCallingBinding`.

Probably

we

need to

fetch the table definition from the catalog somewhere. Do

we

treat

those

window TVF specially in parser/planner so that catalog is

looked

up

when

they are seen?

For what model is, I'm wondering if it has to be datatype

or

relation.

Can

it be another kind of citizen parallel to

datatype/relation/function/db?

Redshift also supports `show models` operation, so it

seems

it's

treated

specially as well? The reasons I don't like Redshift's

syntax

are:

1. It's a bit verbose, you need to think of a model name

as

well

as

function name and the function name also needs to be

unique.

2. More importantly, prediction function isn't the only

function

that

can

operate on models. There could be a set of inference

functions

[1]

and

evaluation functions [2] which can operate on models.

It's

hard

to

specify

all of them in model creation.

[1]:

https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-predict

[2]:

https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate


Thanks,
Hao

On Thu, Mar 14, 2024 at 8:18 PM Jark Wu <

[email protected]>

wrote:

Hi Hao,

Can you send me some pointers

where the function gets the table information?

Here is the code of cumulate window type checking [1].

Also is it possible to support <query_stmt> in

window functions in addiction to table?

Yes. It is not allowed in TVF.

Thanks for the syntax links of other systems. The

reason I

prefer

the

Redshift way is
that it avoids introducing Model as a relation or

datatype

(referenced

as a

parameter in TVF).
Model is not a relation because it can be queried

directly

(e.g.,

SELECT

FROM model).
I'm also confused about making Model as a datatype,

because

don't

know

what class the
model parameter of the eval method of

TableFunction/ScalarFunction

should

be. By defining
the function with the model, users can directly invoke

the

function

without

reference to the model name.

Best,
Jark

[1]:

https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/functions/sql/SqlCumulateTableFunction.java#L53


On Fri, 15 Mar 2024 at 02:48, Hao Li

<[email protected]

wrote:

Hi Jark,

Thanks for the pointers. It's very helpful.

1. Looks like `tumble`, `hopping` are keywords in

calcite

parser.

And

the

syntax `cumulate(Table my_table, ...)` needs to get

table

information

from

catalog somewhere for type validation etc. Can you send

me

some

pointers

where the function gets the table information?
2. The ideal syntax for model function I think would be

`ML_PREDICT(MODEL

<model_name>, {table <table_name> | (query_stmt) })`. I

think

with

special

handling of the `ML_PREDICT` function in

parser/planner,

maybe

we

can

do

this like window functions. But to support `MODEL`

keyword,

we

need

calcite

parser change I guess. Also is it possible to support

<query_stmt>

in

window functions in addiction to table?

For the redshift syntax, I'm not sure the purpose of

defining

the

function

name with the model. Is it to define the function

input/output

schema?

We

have the schema in our create model syntax and the

`ML_PREDICT`

can

handle

it by getting model definition. I think our syntax is

more

concise

to

have

a generic prediction function. I also did some research

and

it's

the

syntax

used by Databricks `ai_query` [1], Snowflake `predict`

[2],

Azureml

`predict` [3].

[1]:

https://docs.databricks.com/en/sql/language-manual/functions/ai_query.html

[2]:

https://github.com/Snowflake-Labs/sfguide-intro-to-machine-learning-with-snowpark-ml-for-python/blob/main/3_snowpark_ml_model_training_inference.ipynb?_fsi=sksXUwQ0

[3]:

https://learn.microsoft.com/en-us/sql/machine-learning/tutorials/quickstart-python-train-score-model?view=azuresqldb-mi-current


Thanks,
Hao

On Wed, Mar 13, 2024 at 8:57 PM Jark Wu <

[email protected]>

wrote:

Hi Mingge, Hao,

Thanks for your replies.

PTF is actually the ideal approach for model

functions,

and

we

do

have

the plans to use PTF for
all model functions (including prediction, evaluation

etc..)

once

the

PTF

is supported in FlinkSQL
confluent extension.

It sounds that PTF is the ideal way and table function

is

temporary

solution which will be dropped in the future.
I'm not sure whether we can implement it using PTF in

Flink

SQL.

But

we

have implemented window
functions using PTF[1]. And introduced a new window

function

(called

CUMULATE[2]) in Flink SQL based
on this. I think it might work to use PTF and

implement

model

function

syntax like this:

SELECT * FROM TABLE(ML_PREDICT(
      TABLE my_table,
      my_model,
      col1,
      col2
));

Besides, did you consider following the way of AWS

Redshift

which

defines

model function with the model itself together?
IIUC, a model is a black-box which defines input

parameters

and

output

parameters which can be modeled into functions.


Best,
Jark

[1]:

https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-tvf/#session

[2]:

https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-CumulatingWindows

[3]:

https://github.com/aws-samples/amazon-redshift-ml-getting-started/blob/main/use-cases/bring-your-own-model-remote-inference/README.md#create-model





On Wed, 13 Mar 2024 at 15:00, Hao Li

<[email protected]

wrote:

Hi Jark,

Thanks for your questions. These are good questions!

1. The polymorphism table function I was referring to

takes a

table

as

input and outputs a table. So the syntax would be

like

```
SELECT * FROM ML_PREDICT('model', (SELECT * FROM

my_table))

```
As far as I know, this is not supported yet on Flink.

So

before

it's

supported, one option for the predict function is

using

table

function

which can output multiple columns
```
SELECT * FROM my_table, LATERAL VIEW

(ML_PREDICT('model',

col1,

col2))

```

2. Good question. Type inference is hard for the

`ML_PREDICT`

function

because it takes a model name string as input. I can

think

of

three

ways

of

doing type inference for it.
       1). Treat `ML_PREDICT` function as something

special

and

during

sql

parsing or planning time, if it's encountered, we

need

to

look

up

the

model

from the first argument which is a model name from

catalog.

Then

we

can

infer the input/output for the function.
       2). We can define a `model` keyword and use

that

in

the

predict

function

to indicate the argument refers to a model. So it's

like

`ML_PREDICT(model

'my_model', col1, col2))`
       3). We can create a special type of table

function

maybe

called

`ModelFunction` which can resolve the model type

inference

by

special

handling it during parsing or planning time.
1) is hacky, 2) isn't supported in Flink for

function,

3)

might

be

good option.

3. I sketched the `ML_PREDICT` function for

inference.

But

there

are

limitations of the function mentioned in 1 and 2. So

maybe

we

don't

need

to

introduce them as built-in functions until

polymorphism

table

function

and

we can properly deal with type inference.
After that, defining a user-defined model function

should

also

be

straightforward.

4. For model types, do you mean 'remote', 'import',

'native'

models

or

other things?

5. We could support popular providers such as

'azureml',

'vertexai',

'googleai' as long as we support the `ML_PREDICT`

function.

Users

should

be

able to implement 3rd-party providers if they can

implement a

function

handling the input/output for the provider.

I think for the model functions, there are still

dependencies

or

hacks

we

need to sort out as a built-in function. Maybe we can

separate

that

as

follow up if we want to have it built-in and focus on

the

model

syntax

for

this FLIP?

Thanks,
Hao

On Tue, Mar 12, 2024 at 10:33 PM Jark Wu <

[email protected]

wrote:

Hi Minge, Chris, Hao,

Thanks for proposing this interesting idea. I think

this

is

nice

step

towards
the AI world for Apache Flink. I don't know much

about

AI/ML,

so

may

have

some stupid questions.

1. Could you tell more about why polymorphism table

function

(PTF)

doesn't

work and do we have plan to use PTF as model

functions?


2. What kind of object does the model map to in

SQL? A

relation

or

data

type?
It looks like a data type because we use it as a

parameter

of

the

table

function.
If it is a data type, how does it cooperate with

type

inference[1]?


3. What built-in model functions will we support?

How

to

define a

user-defined model function?

4. What built-in model types will we support? How to

define

user-defined

model type?

5. Regarding the remote model, what providers will

we

support?

Can

users

implement
3rd-party providers except OpenAI?

Best,
Jark

[1]:

https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/udfs/#type-inference





On Wed, 13 Mar 2024 at 05:55, Hao Li

<[email protected]

wrote:

Hi, Dev


Mingge, Chris and I would like to start a

discussion

about

FLIP-437:

Support ML Models in Flink SQL.

This FLIP is proposing to support machine learning

models

in

Flink

SQL

syntax so that users can CRUD models with Flink SQL

and

use

models

on

Flink

to do prediction with Flink data. The FLIP also

proposes

new

model

entities

and changes to catalog interface to support model

CRUD

operations

in

catalog.

For more details, see FLIP-437 [1]. Looking forward

to

your

feedback.

[1]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL


Thanks,
Minge, Chris & Hao

Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL

Reply via email to