Hi Hao,

> Can you send me some pointers
where the function gets the table information?

Here is the code of cumulate window type checking [1].

> Also is it possible to support <query_stmt> in
window functions in addiction to table?

Yes. It is not allowed in TVF.

Thanks for the syntax links of other systems. The reason I prefer the
Redshift way is
that it avoids introducing Model as a relation or datatype (referenced as a
parameter in TVF).
Model is not a relation because it can be queried directly (e.g., SELECT *
FROM model).
I'm also confused about making Model as a datatype, because I don't know
what class the
model parameter of the eval method of TableFunction/ScalarFunction should
be. By defining
the function with the model, users can directly invoke the function without
reference to the model name.

Best,
Jark

[1]:
https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/functions/sql/SqlCumulateTableFunction.java#L53

On Fri, 15 Mar 2024 at 02:48, Hao Li <h...@confluent.io.invalid> wrote:

> Hi Jark,
>
> Thanks for the pointers. It's very helpful.
>
> 1. Looks like `tumble`, `hopping` are keywords in calcite parser. And the
> syntax `cumulate(Table my_table, ...)` needs to get table information from
> catalog somewhere for type validation etc. Can you send me some pointers
> where the function gets the table information?
> 2. The ideal syntax for model function I think would be `ML_PREDICT(MODEL
> <model_name>, {table <table_name> | (query_stmt) })`. I think with special
> handling of the `ML_PREDICT` function in parser/planner, maybe we can do
> this like window functions. But to support `MODEL` keyword, we need calcite
> parser change I guess. Also is it possible to support <query_stmt> in
> window functions in addiction to table?
>
> For the redshift syntax, I'm not sure the purpose of defining the function
> name with the model. Is it to define the function input/output schema? We
> have the schema in our create model syntax and the `ML_PREDICT` can handle
> it by getting model definition. I think our syntax is more concise to have
> a generic prediction function. I also did some research and it's the syntax
> used by Databricks `ai_query` [1], Snowflake `predict` [2], Azureml
> `predict` [3].
>
> [1]:
> https://docs.databricks.com/en/sql/language-manual/functions/ai_query.html
> [2]:
>
> https://github.com/Snowflake-Labs/sfguide-intro-to-machine-learning-with-snowpark-ml-for-python/blob/main/3_snowpark_ml_model_training_inference.ipynb?_fsi=sksXUwQ0
> [3]:
>
> https://learn.microsoft.com/en-us/sql/machine-learning/tutorials/quickstart-python-train-score-model?view=azuresqldb-mi-current
>
> Thanks,
> Hao
>
> On Wed, Mar 13, 2024 at 8:57 PM Jark Wu <imj...@gmail.com> wrote:
>
> > Hi Mingge, Hao,
> >
> > Thanks for your replies.
> >
> > > PTF is actually the ideal approach for model functions, and we do have
> > the plans to use PTF for
> > all model functions (including prediction, evaluation etc..) once the PTF
> > is supported in FlinkSQL
> > confluent extension.
> >
> > It sounds that PTF is the ideal way and table function is a temporary
> > solution which will be dropped in the future.
> > I'm not sure whether we can implement it using PTF in Flink SQL. But we
> > have implemented window
> > functions using PTF[1]. And introduced a new window function (called
> > CUMULATE[2]) in Flink SQL based
> > on this. I think it might work to use PTF and implement model function
> > syntax like this:
> >
> > SELECT * FROM TABLE(ML_PREDICT(
> >   TABLE my_table,
> >   my_model,
> >   col1,
> >   col2
> > ));
> >
> > Besides, did you consider following the way of AWS Redshift which defines
> > model function with the model itself together?
> > IIUC, a model is a black-box which defines input parameters and output
> > parameters which can be modeled into functions.
> >
> >
> > Best,
> > Jark
> >
> > [1]:
> >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-tvf/#session
> > [2]:
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-CumulatingWindows
> > [3]:
> >
> >
> https://github.com/aws-samples/amazon-redshift-ml-getting-started/blob/main/use-cases/bring-your-own-model-remote-inference/README.md#create-model
> >
> >
> >
> >
> > On Wed, 13 Mar 2024 at 15:00, Hao Li <h...@confluent.io.invalid> wrote:
> >
> > > Hi Jark,
> > >
> > > Thanks for your questions. These are good questions!
> > >
> > > 1. The polymorphism table function I was referring to takes a table as
> > > input and outputs a table. So the syntax would be like
> > > ```
> > > SELECT * FROM ML_PREDICT('model', (SELECT * FROM my_table))
> > > ```
> > > As far as I know, this is not supported yet on Flink. So before it's
> > > supported, one option for the predict function is using table function
> > > which can output multiple columns
> > > ```
> > > SELECT * FROM my_table, LATERAL VIEW (ML_PREDICT('model', col1, col2))
> > > ```
> > >
> > > 2. Good question. Type inference is hard for the `ML_PREDICT` function
> > > because it takes a model name string as input. I can think of three
> ways
> > of
> > > doing type inference for it.
> > >    1). Treat `ML_PREDICT` function as something special and during sql
> > > parsing or planning time, if it's encountered, we need to look up the
> > model
> > > from the first argument which is a model name from catalog. Then we can
> > > infer the input/output for the function.
> > >    2). We can define a `model` keyword and use that in the predict
> > function
> > > to indicate the argument refers to a model. So it's like
> > `ML_PREDICT(model
> > > 'my_model', col1, col2))`
> > >    3). We can create a special type of table function maybe called
> > > `ModelFunction` which can resolve the model type inference by special
> > > handling it during parsing or planning time.
> > > 1) is hacky, 2) isn't supported in Flink for function, 3) might be a
> > > good option.
> > >
> > > 3. I sketched the `ML_PREDICT` function for inference. But there are
> > > limitations of the function mentioned in 1 and 2. So maybe we don't
> need
> > to
> > > introduce them as built-in functions until polymorphism table function
> > and
> > > we can properly deal with type inference.
> > > After that, defining a user-defined model function should also be
> > > straightforward.
> > >
> > > 4. For model types, do you mean 'remote', 'import', 'native' models or
> > > other things?
> > >
> > > 5. We could support popular providers such as 'azureml', 'vertexai',
> > > 'googleai' as long as we support the `ML_PREDICT` function. Users
> should
> > be
> > > able to implement 3rd-party providers if they can implement a function
> > > handling the input/output for the provider.
> > >
> > > I think for the model functions, there are still dependencies or hacks
> we
> > > need to sort out as a built-in function. Maybe we can separate that as
> a
> > > follow up if we want to have it built-in and focus on the model syntax
> > for
> > > this FLIP?
> > >
> > > Thanks,
> > > Hao
> > >
> > > On Tue, Mar 12, 2024 at 10:33 PM Jark Wu <imj...@gmail.com> wrote:
> > >
> > > > Hi Minge, Chris, Hao,
> > > >
> > > > Thanks for proposing this interesting idea. I think this is a nice
> step
> > > > towards
> > > > the AI world for Apache Flink. I don't know much about AI/ML, so I
> may
> > > have
> > > > some stupid questions.
> > > >
> > > > 1. Could you tell more about why polymorphism table function (PTF)
> > > doesn't
> > > > work and do we have plan to use PTF as model functions?
> > > >
> > > > 2. What kind of object does the model map to in SQL? A relation or a
> > data
> > > > type?
> > > > It looks like a data type because we use it as a parameter of the
> table
> > > > function.
> > > > If it is a data type, how does it cooperate with type inference[1]?
> > > >
> > > > 3. What built-in model functions will we support? How to define a
> > > > user-defined model function?
> > > >
> > > > 4. What built-in model types will we support? How to define a
> > > user-defined
> > > > model type?
> > > >
> > > > 5. Regarding the remote model, what providers will we support? Can
> > users
> > > > implement
> > > > 3rd-party providers except OpenAI?
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > [1]:
> > > >
> > > >
> > >
> >
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/udfs/#type-inference
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, 13 Mar 2024 at 05:55, Hao Li <h...@confluent.io.invalid>
> wrote:
> > > >
> > > > > Hi, Dev
> > > > >
> > > > >
> > > > > Mingge, Chris and I would like to start a discussion about
> FLIP-437:
> > > > > Support ML Models in Flink SQL.
> > > > >
> > > > > This FLIP is proposing to support machine learning models in Flink
> > SQL
> > > > > syntax so that users can CRUD models with Flink SQL and use models
> on
> > > > Flink
> > > > > to do prediction with Flink data. The FLIP also proposes new model
> > > > entities
> > > > > and changes to catalog interface to support model CRUD operations
> in
> > > > > catalog.
> > > > >
> > > > > For more details, see FLIP-437 [1]. Looking forward to your
> feedback.
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL
> > > > >
> > > > > Thanks,
> > > > > Minge, Chris & Hao
> > > > >
> > > >
> > >
> >
>

Reply via email to