Hi everyone,

I updated the FLIP according to this discussion.

@Hao Li: Let me know if I made a mistake somewhere. I added some additional explaning comments about the new PTF syntax.

There are no further objections from my side. If nobody objects, Hao feel free to start the voting tomorrow.

Regards,
Timo


On 28.03.24 16:30, Jark Wu wrote:
Thanks, Hao,

Sounds good to me.

Best,
Jark

On Thu, 28 Mar 2024 at 01:02, Hao Li <h...@confluent.io.invalid> wrote:

Hi Jark,

I think we can start with supporting popular model providers such as
openai, azureml, sagemaker for remote models.

Thanks,
Hao

On Tue, Mar 26, 2024 at 8:15 PM Jark Wu <imj...@gmail.com> wrote:

Thanks for the PoC and updating,

The final syntax looks good to me, at least it is a nice and concise
first
step.

SELECT f1, f2, label FROM
    ML_PREDICT(
      input => `my_data`,
      model => `my_cat`.`my_db`.`classifier_model`,
      args => DESCRIPTOR(f1, f2));

Besides, what built-in models will we support in the FLIP? This might be
important
because it relates to what use cases can run with the new Flink version
out
of the box.

Best,
Jark

On Wed, 27 Mar 2024 at 01:10, Hao Li <h...@confluent.io.invalid> wrote:

Hi Timo,

Yeah. For `primary key` and `from table(...)` those are explicitly
matched
in parser: [1].

SELECT f1, f2, label FROM
    ML_PREDICT(
      input => `my_data`,
      model => `my_cat`.`my_db`.`classifier_model`,
      args => DESCRIPTOR(f1, f2));

This named argument syntax looks good to me. It can be supported
together
with

SELECT f1, f2, label FROM ML_PREDICT(`my_data`,
`my_cat`.`my_db`.`classifier_model`,DESCRIPTOR(f1, f2));

Sure. Will let you know once updated the FLIP.

[1]



https://github.com/confluentinc/flink/blob/release-1.18-confluent/flink-table/flink-sql-parser/src/main/codegen/includes/parserImpls.ftl#L814

Thanks,
Hao

On Tue, Mar 26, 2024 at 4:15 AM Timo Walther <twal...@apache.org>
wrote:

Hi Hao,

  > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)`
doesn't
  > work since `TABLE` and `MODEL` are already key words

This argument doesn't count. The parser supports introducing keywords
that are still non-reserved. For example, this enables using "key"
for
both primary key and a column name:

CREATE TABLE t (i INT PRIMARY KEY NOT ENFORCED)
WITH ('connector' = 'datagen');

SELECT i AS key FROM t;

I'm sure we will introduce `TABLE(my_data)` eventually as this is
what
the standard dictates. But for now, let's use the most compact syntax
possible which is also in sync with Oracle.

TLDR: We allow identifiers as arguments for PTFs which are expanded
with
catalog and database if necessary. Those identifier arguments
translate
to catalog lookups for table and models. The ML_ functions will make
sure that the arguments are of correct type model or table.

SELECT f1, f2, label FROM
    ML_PREDICT(
      input => `my_data`,
      model => `my_cat`.`my_db`.`classifier_model`,
      args => DESCRIPTOR(f1, f2));

So this will allow us to also use in the future:

SELECT * FROM poly_func(table1);

Same support as Oracle [1]. Very concise.

Let me know when you updated the FLIP for a final review before
voting.

Do others have additional objections?

Regards,
Timo

[1]




https://livesql.oracle.com/apex/livesql/file/content_HQK7TYEO0NHSJCDY3LN2ERDV6.html



On 25.03.24 23:40, Hao Li wrote:
Hi Timo,

Please double check if this is implementable with the current
stack. I
fear the parser or validator might not like the "identifier"
argument?

I checked this, currently the validator throws an exception trying
to
get
the full qualifier name for `classifier_model`. But since
`SqlValidatorImpl` is implemented in Flink, we should be able to
fix
this.
The only caveator is if not full model path is provided,
the qualifier is interpreted as a column. We should be able to
special
handle this by rewriting the `ml_predict` function to add the
catalog
and
database name in `FlinkCalciteSqlValidator` though.

SELECT f1, f2, label FROM
     ML_PREDICT(
       TABLE `my_data`,
       my_cat.my_db.classifier_model,
       DESCRIPTOR(f1, f2))

SELECT f1, f2, label FROM
     ML_PREDICT(
       input => TABLE `my_data`,
       model => my_cat.my_db.classifier_model,
       args => DESCRIPTOR(f1, f2))

I verified these can be parsed. The problem is in validator for
qualifier
as mentioned above.

So the safest option would be the long-term solution:

SELECT f1, f2, label FROM
     ML_PREDICT(
       input => TABLE(my_data),
       model => MODEL(my_cat.my_db.classifier_model),
       args => DESCRIPTOR(f1, f2))

`TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't
work
since `TABLE` and `MODEL` are already key words in calcite used by
`CREATE
TABLE`, `CREATE MODEL`. Changing to `model_name(...)` works and
will
be
treated as a function.

So I think

SELECT f1, f2, label FROM
     ML_PREDICT(
       input => TABLE `my_data`,
       model => my_cat.my_db.classifier_model,
       args => DESCRIPTOR(f1, f2))
should be fine for now.

For the syntax part:
1). Sounds good. We can drop model task and model kind from the
definition.
They can be deduced from the options.

2). Sure. We can add temporary model

3). Make sense. We can use `show create model <name>` to display
all
information and `describe model <name>` to show input/output schema

Thanks,
Hao

On Mon, Mar 25, 2024 at 3:21 PM Hao Li <h...@confluent.io> wrote:

Hi Ahmed,

Looks like the feature freeze time for 1.20 release is June 15th.
We
can
definitely get the model DDL into 1.20. For predict and evaluate
functions,
if we can't get into the 1.20 release, we can get them into the
1.21
release for sure.

Thanks,
Hao



On Mon, Mar 25, 2024 at 1:25 AM Timo Walther <twal...@apache.org>
wrote:

Hi Jark and Hao,

thanks for the information, Jark! Great that the Calcite
community
already fixed the problem for us. +1 to adopt the simplified
syntax
asap. Maybe even before we upgrade Calcite (i.e. copy over
classes),
if
upgrading Calcite is too much work right now?

   > Is `DESCRIPTOR` a must in the syntax?

Yes, we should still stick to the standard as much as possible
and
all
vendors use DESCRIPTOR/COLUMNS for distinuishing columns vs.
literal
arguments. So the final syntax of this discussion would be:


SELECT f1, f2, label FROM
     ML_PREDICT(TABLE `my_data`, `classifier_model`,
DESCRIPTOR(f1,
f2))

SELECT * FROM
     ML_EVALUATE(TABLE `eval_data`, `classifier_model`,
DESCRIPTOR(f1,
f2))

Please double check if this is implementable with the current
stack.
I
fear the parser or validator might not like the "identifier"
argument?

Make sure that also these variations are supported:

SELECT f1, f2, label FROM
     ML_PREDICT(
       TABLE `my_data`,
       my_cat.my_db.classifier_model,
       DESCRIPTOR(f1, f2))

SELECT f1, f2, label FROM
     ML_PREDICT(
       input => TABLE `my_data`,
       model => my_cat.my_db.classifier_model,
       args => DESCRIPTOR(f1, f2))

It might be safer and more future proof to wrap a MODEL()
function
around it. This would be more in sync with the standard that
actually
still requires to put a TABLE() around the input argument:

ML_PREDICT(TABLE(`my_data`) PARTITIONED BY c1 ORDERED BY c1,
....)

So the safest option would be the long-term solution:

SELECT f1, f2, label FROM
     ML_PREDICT(
       input => TABLE(my_data),
       model => MODEL(my_cat.my_db.classifier_model),
       args => DESCRIPTOR(f1, f2))

But I'm fine with this if others have a strong opinion:

SELECT f1, f2, label FROM
     ML_PREDICT(
       input => TABLE `my_data`,
       model => my_cat.my_db.classifier_model,
       args => DESCRIPTOR(f1, f2))

Some feedback for the remainder of the FLIP:

1) Simplify catalog objects

I would suggest to drop:
CatalogModel.getModelKind()
CatalogModel.getModelTask()

A catalog object should fully resemble the DDL. And since the DDL
puts
those properties in the WITH clause, the catalog object should
the
same
(i.e. put them into the `getModelOptions()`). Btw renaming this
method
to just `getOptions()` for consistency should be good as well.
Internally, we can still provide enums for these frequently used
classes. Similar to what we do in `FactoryUtil` for other
frequently
used options.

Remove `getDescription()` and `getDetailedDescription()`. They
were a
mistake for CatalogTable and should actually be deprecated. They
got
replaced by `getComment()` which is sufficient.

2) CREATE TEMPORARY MODEL is not supported.

This is an unnecessary restriction. We should support temporary
versions
of these catalog objects as well for consistency. Adding support
for
this should be straightforward.

3) DESCRIBE | DESC } MODEL
[catalog_name.][database_name.]model_name

I would suggest we support `SHOW CREATE MODEL` instead. Similar
to
`SHOW
CREATE TABLE`, this should show all properties. If we support
`DESCRIBE
MODEL` it should only list the input parameters similar to
`DESCRIBE
TABLE` only shows the columns (not the WITH clause).

Regards,
Timo


On 23.03.24 13:17, Ahmed Hamdy wrote:
Hi everyone,
+1 for this proposal, I believe it is very useful to the
minimum,
It
would
be great even having  "ML_PREDICT" and "ML_EVALUATE" as built-in
PTFs
in
this FLIP as discussed.
IIUC this will be included in the 1.20 roadmap?
Best Regards
Ahmed Hamdy


On Fri, 22 Mar 2024 at 23:54, Hao Li <h...@confluent.io.invalid>
wrote:

Hi Timo and Jark,

I agree Oracle's syntax seems concise and more descriptive. For
the
built-in `ML_PREDICT` and `ML_EVALUATE` functions I agree with
Jark
we
can
support them as built-in PTF using `SqlTableFunction` for this
FLIP.
We can
have a different FLIP discussing user defined PTF and adopt
that
later
for
model functions later. To summarize, the current proposed
syntax
is

SELECT f1, f2, label FROM TABLE(ML_PREDICT(TABLE `my_data`,
`classifier_model`, f1, f2))

SELECT * FROM TABLE(ML_EVALUATE(TABLE `eval_data`,
`classifier_model`,
f1,
f2))

Is `DESCRIPTOR` a must in the syntax? If so, it becomes

SELECT f1, f2, label FROM TABLE(ML_PREDICT(TABLE `my_data`,
`classifier_model`, DESCRIPTOR(f1), DESCRIPTOR(f2)))

SELECT * FROM TABLE(ML_EVALUATE(TABLE `eval_data`,
`classifier_model`,
DESCRIPTOR(f1), DESCRIPTOR(f2)))

If Calcite supports dropping outer table keyword, it becomes

SELECT f1, f2, label FROM ML_PREDICT(TABLE `my_data`,
`classifier_model`,
DESCRIPTOR(f1), DESCRIPTOR(f2))

SELECT * FROM ML_EVALUATE(TABLE `eval_data`,
`classifier_model`,
DESCRIPTOR(
f1), DESCRIPTOR(f2))

Thanks,
Hao



On Fri, Mar 22, 2024 at 9:16 AM Jark Wu <imj...@gmail.com>
wrote:

Sorry, I mean we can bump the Calcite version if needed in
Flink
1.20.

On Fri, 22 Mar 2024 at 22:19, Jark Wu <imj...@gmail.com>
wrote:

Hi Timo,

Introducing user-defined PTF is very useful in Flink, I'm +1
for
this.
But I think the ML model FLIP is not blocked by this, because
we
can introduce ML_PREDICT and ML_EVALUATE as built-in PTFs
just like TUMBLE/HOP. And support user-defined ML functions
as
a future FLIP.

Regarding the simplified PTF syntax which reduces the outer
TABLE()
keyword,
it seems it was just supported[1] by the Calcite community
last
month
and
will be
released in the next version (v1.37). The Calcite community
is
preparing
the
1.37 release, so we can bump the version if needed in Flink
1.19.

Best,
Jark

[1]: https://issues.apache.org/jira/browse/CALCITE-6254

On Fri, 22 Mar 2024 at 21:46, Timo Walther <
twal...@apache.org

wrote:

Hi everyone,

this is a very important change to the Flink SQL syntax but
we
can't
wait until the SQL standard is ready for this. So I'm +1 on
introducing
the MODEL concept as a first class citizen in Flink.

For your information: Over the past months I have already
spent
a
significant amount of time thinking about how we can
introduce
PTFs
in
Flink. I reserved FLIP-440[1] for this purpose and I will
share
a
version of this in the next 1-2 weeks.

For a good implementation of FLIP-440 and also FLIP-437, we
should
evolve the PTF syntax in collaboration with Apache Calcite.

There are different syntax versions out there:

1) Flink

SELECT * FROM
      TABLE(TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL
'10'
MINUTES));

2) SQL standard

SELECT * FROM
      TABLE(TUMBLE(TABLE(Bid), DESCRIPTOR(bidtime), INTERVAL
'10'
MINUTES));

3) Oracle

SELECT * FROM
       TUMBLE(Bid, COLUMNS(bidtime), INTERVAL '10' MINUTES));

As you can see above, Flink does not follow the standard
correctly
as
it
would need to use `TABLE()` but this is not provided by
Calcite
yet.

I really like the Oracle syntax[2][3] a lot. It reduces
necessary
keywords to a minimum. Personally, I would like to discuss
this
syntax
in a separate FLIP and hope I will find supporters for:


SELECT * FROM
      TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10'
MINUTES);

If we go entirely with the Oracle syntax, as you can see in
the
example,
Oracle allows for passing identifiers directly. This would
solve
our
problems for the MODEL as well:

SELECT f1, f2, label FROM ML_PREDICT(
      data => `my_data`,
      model => `classifier_model`,
      input => DESCRIPTOR(f1, f2));

Or we completely adopt the Oracle syntax:

SELECT f1, f2, label FROM ML_PREDICT(
      data => `my_data`,
      model => `classifier_model`,
      input => COLUMNS(f1, f2));


What do you think?

Happy to create a FLIP for just this syntax question and
collaborate
with the Calcite community on this. Supporting the syntax of
Oracle
shouldn't be too hard to convince at least as parser
parameter.

Regards,
Timo

[1]








https://cwiki.apache.org/confluence/display/FLINK/%5BWIP%5D+FLIP-440%3A+User-defined+Polymorphic+Table+Functions
[2]








https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_TF.html#GUID-0F66E239-DE77-4C0E-AC76-D5B632AB8072
[3]

https://oracle-base.com/articles/18c/polymorphic-table-functions-18c



On 20.03.24 17:22, Mingge Deng wrote:
Thanks Jark for all the insightful comments.

We have updated the proposal per our offline discussions:
1. Model will be treated as a new relation in FlinkSQL.
2. Include the common ML predict and evaluate functions
into
the
open
source flink to complete the user journey.
        And we should be able to extend the calcite
SqlTableFunction
to
support
these two ML functions.

Best,
Mingge

On Mon, Mar 18, 2024 at 7:05 PM Jark Wu <imj...@gmail.com>
wrote:

Hi Hao,

I meant how the table name
in window TVF gets translated to `SqlCallingBinding`.
Probably
we
need
to
fetch the table definition from the catalog somewhere. Do
we
treat
those
window TVF specially in parser/planner so that catalog is
looked
up
when
they are seen?

The table names are resolved and validated by Calcite
SqlValidator.
We
don' need to fetch from catalog manually.
The specific checking logic of cumulate window happens in

SqlCumulateTableFunction.OperandMetadataImpl#checkOperandTypes.
The return type of SqlCumulateTableFunction is defined in
#getRowTypeInference() method.
Both are public interfaces provided by Calcite and it
seems
it's
not
specially handled in parser/planner.

I didn't try that, but my gut feeling is that the
framework
is
ready
to
extend a customized TVF.

For what model is, I'm wondering if it has to be datatype
or
relation.
Can
it be another kind of citizen parallel to
datatype/relation/function/db?
Redshift also supports `show models` operation, so it
seems
it's
treated
specially as well?

If it is an entity only used in catalog scope (e.g., show
xxx,
create
xxx,
drop xxx), it is fine to introduce it.
We have introduced such one before, called Module: "load
module",
"show
modules" [1].
But if we want to use Model in TVF parameters, it means it
has
to
be
a
relation or datatype, because
that is what it only accepts now.

Thanks for sharing the reason of preferring TVF instead of
Redshift
way. It
sounds reasonable to me.

Best,
Jark

     [1]:









https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/modules/

On Fri, 15 Mar 2024 at 13:41, Hao Li
<h...@confluent.io.invalid

wrote:

Hi Jark,

Thanks for the pointer. Sorry for the confusion: I meant
how
the
table
name
in window TVF gets translated to `SqlCallingBinding`.
Probably
we
need to
fetch the table definition from the catalog somewhere. Do
we
treat
those
window TVF specially in parser/planner so that catalog is
looked
up
when
they are seen?

For what model is, I'm wondering if it has to be datatype
or
relation.
Can
it be another kind of citizen parallel to
datatype/relation/function/db?
Redshift also supports `show models` operation, so it
seems
it's
treated
specially as well? The reasons I don't like Redshift's
syntax
are:
1. It's a bit verbose, you need to think of a model name
as
well
as
a
function name and the function name also needs to be
unique.
2. More importantly, prediction function isn't the only
function
that
can
operate on models. There could be a set of inference
functions
[1]
and
evaluation functions [2] which can operate on models.
It's
hard
to
specify
all of them in model creation.

[1]:










https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-predict
[2]:










https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate

Thanks,
Hao

On Thu, Mar 14, 2024 at 8:18 PM Jark Wu <
imj...@gmail.com>
wrote:

Hi Hao,

Can you send me some pointers
where the function gets the table information?

Here is the code of cumulate window type checking [1].

Also is it possible to support <query_stmt> in
window functions in addiction to table?

Yes. It is not allowed in TVF.

Thanks for the syntax links of other systems. The
reason I
prefer
the
Redshift way is
that it avoids introducing Model as a relation or
datatype
(referenced
as a
parameter in TVF).
Model is not a relation because it can be queried
directly
(e.g.,
SELECT
*
FROM model).
I'm also confused about making Model as a datatype,
because
I
don't
know
what class the
model parameter of the eval method of
TableFunction/ScalarFunction
should
be. By defining
the function with the model, users can directly invoke
the
function
without
reference to the model name.

Best,
Jark

[1]:











https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/functions/sql/SqlCumulateTableFunction.java#L53

On Fri, 15 Mar 2024 at 02:48, Hao Li
<h...@confluent.io.invalid

wrote:

Hi Jark,

Thanks for the pointers. It's very helpful.

1. Looks like `tumble`, `hopping` are keywords in
calcite
parser.
And
the
syntax `cumulate(Table my_table, ...)` needs to get
table
information
from
catalog somewhere for type validation etc. Can you send
me
some
pointers
where the function gets the table information?
2. The ideal syntax for model function I think would be
`ML_PREDICT(MODEL
<model_name>, {table <table_name> | (query_stmt) })`. I
think
with
special
handling of the `ML_PREDICT` function in
parser/planner,
maybe
we
can
do
this like window functions. But to support `MODEL`
keyword,
we
need
calcite
parser change I guess. Also is it possible to support
<query_stmt>
in
window functions in addiction to table?

For the redshift syntax, I'm not sure the purpose of
defining
the
function
name with the model. Is it to define the function
input/output
schema?
We
have the schema in our create model syntax and the
`ML_PREDICT`
can
handle
it by getting model definition. I think our syntax is
more
concise
to
have
a generic prediction function. I also did some research
and
it's
the
syntax
used by Databricks `ai_query` [1], Snowflake `predict`
[2],
Azureml
`predict` [3].

[1]:











https://docs.databricks.com/en/sql/language-manual/functions/ai_query.html
[2]:












https://github.com/Snowflake-Labs/sfguide-intro-to-machine-learning-with-snowpark-ml-for-python/blob/main/3_snowpark_ml_model_training_inference.ipynb?_fsi=sksXUwQ0
[3]:












https://learn.microsoft.com/en-us/sql/machine-learning/tutorials/quickstart-python-train-score-model?view=azuresqldb-mi-current

Thanks,
Hao

On Wed, Mar 13, 2024 at 8:57 PM Jark Wu <
imj...@gmail.com>
wrote:

Hi Mingge, Hao,

Thanks for your replies.

PTF is actually the ideal approach for model
functions,
and
we
do
have
the plans to use PTF for
all model functions (including prediction, evaluation
etc..)
once
the
PTF
is supported in FlinkSQL
confluent extension.

It sounds that PTF is the ideal way and table function
is
a
temporary
solution which will be dropped in the future.
I'm not sure whether we can implement it using PTF in
Flink
SQL.
But
we
have implemented window
functions using PTF[1]. And introduced a new window
function
(called
CUMULATE[2]) in Flink SQL based
on this. I think it might work to use PTF and
implement
model
function
syntax like this:

SELECT * FROM TABLE(ML_PREDICT(
      TABLE my_table,
      my_model,
      col1,
      col2
));

Besides, did you consider following the way of AWS
Redshift
which
defines
model function with the model itself together?
IIUC, a model is a black-box which defines input
parameters
and
output
parameters which can be modeled into functions.


Best,
Jark

[1]:













https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-tvf/#session
[2]:













https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-CumulatingWindows
[3]:













https://github.com/aws-samples/amazon-redshift-ml-getting-started/blob/main/use-cases/bring-your-own-model-remote-inference/README.md#create-model




On Wed, 13 Mar 2024 at 15:00, Hao Li
<h...@confluent.io.invalid

wrote:

Hi Jark,

Thanks for your questions. These are good questions!

1. The polymorphism table function I was referring to
takes a
table
as
input and outputs a table. So the syntax would be
like
```
SELECT * FROM ML_PREDICT('model', (SELECT * FROM
my_table))
```
As far as I know, this is not supported yet on Flink.
So
before
it's
supported, one option for the predict function is
using
table
function
which can output multiple columns
```
SELECT * FROM my_table, LATERAL VIEW
(ML_PREDICT('model',
col1,
col2))
```

2. Good question. Type inference is hard for the
`ML_PREDICT`
function
because it takes a model name string as input. I can
think
of
three
ways
of
doing type inference for it.
       1). Treat `ML_PREDICT` function as something
special
and
during
sql
parsing or planning time, if it's encountered, we
need
to
look
up
the
model
from the first argument which is a model name from
catalog.
Then
we
can
infer the input/output for the function.
       2). We can define a `model` keyword and use
that
in
the
predict
function
to indicate the argument refers to a model. So it's
like
`ML_PREDICT(model
'my_model', col1, col2))`
       3). We can create a special type of table
function
maybe
called
`ModelFunction` which can resolve the model type
inference
by
special
handling it during parsing or planning time.
1) is hacky, 2) isn't supported in Flink for
function,
3)
might
be
a
good option.

3. I sketched the `ML_PREDICT` function for
inference.
But
there
are
limitations of the function mentioned in 1 and 2. So
maybe
we
don't
need
to
introduce them as built-in functions until
polymorphism
table
function
and
we can properly deal with type inference.
After that, defining a user-defined model function
should
also
be
straightforward.

4. For model types, do you mean 'remote', 'import',
'native'
models
or
other things?

5. We could support popular providers such as
'azureml',
'vertexai',
'googleai' as long as we support the `ML_PREDICT`
function.
Users
should
be
able to implement 3rd-party providers if they can
implement a
function
handling the input/output for the provider.

I think for the model functions, there are still
dependencies
or
hacks
we
need to sort out as a built-in function. Maybe we can
separate
that
as
a
follow up if we want to have it built-in and focus on
the
model
syntax
for
this FLIP?

Thanks,
Hao

On Tue, Mar 12, 2024 at 10:33 PM Jark Wu <
imj...@gmail.com

wrote:

Hi Minge, Chris, Hao,

Thanks for proposing this interesting idea. I think
this
is
a
nice
step
towards
the AI world for Apache Flink. I don't know much
about
AI/ML,
so
I
may
have
some stupid questions.

1. Could you tell more about why polymorphism table
function
(PTF)
doesn't
work and do we have plan to use PTF as model
functions?

2. What kind of object does the model map to in
SQL? A
relation
or
a
data
type?
It looks like a data type because we use it as a
parameter
of
the
table
function.
If it is a data type, how does it cooperate with
type
inference[1]?

3. What built-in model functions will we support?
How
to
define a
user-defined model function?

4. What built-in model types will we support? How to
define
a
user-defined
model type?

5. Regarding the remote model, what providers will
we
support?
Can
users
implement
3rd-party providers except OpenAI?

Best,
Jark

[1]:















https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/udfs/#type-inference




On Wed, 13 Mar 2024 at 05:55, Hao Li
<h...@confluent.io.invalid

wrote:

Hi, Dev


Mingge, Chris and I would like to start a
discussion
about
FLIP-437:
Support ML Models in Flink SQL.

This FLIP is proposing to support machine learning
models
in
Flink
SQL
syntax so that users can CRUD models with Flink SQL
and
use
models
on
Flink
to do prediction with Flink data. The FLIP also
proposes
new
model
entities
and changes to catalog interface to support model
CRUD
operations
in
catalog.

For more details, see FLIP-437 [1]. Looking forward
to
your
feedback.


[1]
















https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL

Thanks,
Minge, Chris & Hao

























Reply via email to