Thanks, Hao, Sounds good to me.
Best, Jark On Thu, 28 Mar 2024 at 01:02, Hao Li <h...@confluent.io.invalid> wrote: > Hi Jark, > > I think we can start with supporting popular model providers such as > openai, azureml, sagemaker for remote models. > > Thanks, > Hao > > On Tue, Mar 26, 2024 at 8:15 PM Jark Wu <imj...@gmail.com> wrote: > > > Thanks for the PoC and updating, > > > > The final syntax looks good to me, at least it is a nice and concise > first > > step. > > > > SELECT f1, f2, label FROM > > ML_PREDICT( > > input => `my_data`, > > model => `my_cat`.`my_db`.`classifier_model`, > > args => DESCRIPTOR(f1, f2)); > > > > Besides, what built-in models will we support in the FLIP? This might be > > important > > because it relates to what use cases can run with the new Flink version > out > > of the box. > > > > Best, > > Jark > > > > On Wed, 27 Mar 2024 at 01:10, Hao Li <h...@confluent.io.invalid> wrote: > > > > > Hi Timo, > > > > > > Yeah. For `primary key` and `from table(...)` those are explicitly > > matched > > > in parser: [1]. > > > > > > > SELECT f1, f2, label FROM > > > ML_PREDICT( > > > input => `my_data`, > > > model => `my_cat`.`my_db`.`classifier_model`, > > > args => DESCRIPTOR(f1, f2)); > > > > > > This named argument syntax looks good to me. It can be supported > together > > > with > > > > > > SELECT f1, f2, label FROM ML_PREDICT(`my_data`, > > > `my_cat`.`my_db`.`classifier_model`,DESCRIPTOR(f1, f2)); > > > > > > Sure. Will let you know once updated the FLIP. > > > > > > [1] > > > > > > > > > https://github.com/confluentinc/flink/blob/release-1.18-confluent/flink-table/flink-sql-parser/src/main/codegen/includes/parserImpls.ftl#L814 > > > > > > Thanks, > > > Hao > > > > > > On Tue, Mar 26, 2024 at 4:15 AM Timo Walther <twal...@apache.org> > wrote: > > > > > > > Hi Hao, > > > > > > > > > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` > doesn't > > > > > work since `TABLE` and `MODEL` are already key words > > > > > > > > This argument doesn't count. The parser supports introducing keywords > > > > that are still non-reserved. For example, this enables using "key" > for > > > > both primary key and a column name: > > > > > > > > CREATE TABLE t (i INT PRIMARY KEY NOT ENFORCED) > > > > WITH ('connector' = 'datagen'); > > > > > > > > SELECT i AS key FROM t; > > > > > > > > I'm sure we will introduce `TABLE(my_data)` eventually as this is > what > > > > the standard dictates. But for now, let's use the most compact syntax > > > > possible which is also in sync with Oracle. > > > > > > > > TLDR: We allow identifiers as arguments for PTFs which are expanded > > with > > > > catalog and database if necessary. Those identifier arguments > translate > > > > to catalog lookups for table and models. The ML_ functions will make > > > > sure that the arguments are of correct type model or table. > > > > > > > > SELECT f1, f2, label FROM > > > > ML_PREDICT( > > > > input => `my_data`, > > > > model => `my_cat`.`my_db`.`classifier_model`, > > > > args => DESCRIPTOR(f1, f2)); > > > > > > > > So this will allow us to also use in the future: > > > > > > > > SELECT * FROM poly_func(table1); > > > > > > > > Same support as Oracle [1]. Very concise. > > > > > > > > Let me know when you updated the FLIP for a final review before > voting. > > > > > > > > Do others have additional objections? > > > > > > > > Regards, > > > > Timo > > > > > > > > [1] > > > > > > > > > > > > > > https://livesql.oracle.com/apex/livesql/file/content_HQK7TYEO0NHSJCDY3LN2ERDV6.html > > > > > > > > > > > > > > > > On 25.03.24 23:40, Hao Li wrote: > > > > > Hi Timo, > > > > > > > > > >> Please double check if this is implementable with the current > > stack. I > > > > > fear the parser or validator might not like the "identifier" > > argument? > > > > > > > > > > I checked this, currently the validator throws an exception trying > to > > > get > > > > > the full qualifier name for `classifier_model`. But since > > > > > `SqlValidatorImpl` is implemented in Flink, we should be able to > fix > > > > this. > > > > > The only caveator is if not full model path is provided, > > > > > the qualifier is interpreted as a column. We should be able to > > special > > > > > handle this by rewriting the `ml_predict` function to add the > catalog > > > and > > > > > database name in `FlinkCalciteSqlValidator` though. > > > > > > > > > >> SELECT f1, f2, label FROM > > > > > ML_PREDICT( > > > > > TABLE `my_data`, > > > > > my_cat.my_db.classifier_model, > > > > > DESCRIPTOR(f1, f2)) > > > > > > > > > > SELECT f1, f2, label FROM > > > > > ML_PREDICT( > > > > > input => TABLE `my_data`, > > > > > model => my_cat.my_db.classifier_model, > > > > > args => DESCRIPTOR(f1, f2)) > > > > > > > > > > I verified these can be parsed. The problem is in validator for > > > qualifier > > > > > as mentioned above. > > > > > > > > > >> So the safest option would be the long-term solution: > > > > > > > > > > SELECT f1, f2, label FROM > > > > > ML_PREDICT( > > > > > input => TABLE(my_data), > > > > > model => MODEL(my_cat.my_db.classifier_model), > > > > > args => DESCRIPTOR(f1, f2)) > > > > > > > > > > `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't > > > work > > > > > since `TABLE` and `MODEL` are already key words in calcite used by > > > > `CREATE > > > > > TABLE`, `CREATE MODEL`. Changing to `model_name(...)` works and > will > > be > > > > > treated as a function. > > > > > > > > > > So I think > > > > > > > > > > SELECT f1, f2, label FROM > > > > > ML_PREDICT( > > > > > input => TABLE `my_data`, > > > > > model => my_cat.my_db.classifier_model, > > > > > args => DESCRIPTOR(f1, f2)) > > > > > should be fine for now. > > > > > > > > > > For the syntax part: > > > > > 1). Sounds good. We can drop model task and model kind from the > > > > definition. > > > > > They can be deduced from the options. > > > > > > > > > > 2). Sure. We can add temporary model > > > > > > > > > > 3). Make sense. We can use `show create model <name>` to display > all > > > > > information and `describe model <name>` to show input/output schema > > > > > > > > > > Thanks, > > > > > Hao > > > > > > > > > > On Mon, Mar 25, 2024 at 3:21 PM Hao Li <h...@confluent.io> wrote: > > > > > > > > > >> Hi Ahmed, > > > > >> > > > > >> Looks like the feature freeze time for 1.20 release is June 15th. > We > > > can > > > > >> definitely get the model DDL into 1.20. For predict and evaluate > > > > functions, > > > > >> if we can't get into the 1.20 release, we can get them into the > 1.21 > > > > >> release for sure. > > > > >> > > > > >> Thanks, > > > > >> Hao > > > > >> > > > > >> > > > > >> > > > > >> On Mon, Mar 25, 2024 at 1:25 AM Timo Walther <twal...@apache.org> > > > > wrote: > > > > >> > > > > >>> Hi Jark and Hao, > > > > >>> > > > > >>> thanks for the information, Jark! Great that the Calcite > community > > > > >>> already fixed the problem for us. +1 to adopt the simplified > syntax > > > > >>> asap. Maybe even before we upgrade Calcite (i.e. copy over > > classes), > > > if > > > > >>> upgrading Calcite is too much work right now? > > > > >>> > > > > >>> > Is `DESCRIPTOR` a must in the syntax? > > > > >>> > > > > >>> Yes, we should still stick to the standard as much as possible > and > > > all > > > > >>> vendors use DESCRIPTOR/COLUMNS for distinuishing columns vs. > > literal > > > > >>> arguments. So the final syntax of this discussion would be: > > > > >>> > > > > >>> > > > > >>> SELECT f1, f2, label FROM > > > > >>> ML_PREDICT(TABLE `my_data`, `classifier_model`, > DESCRIPTOR(f1, > > > f2)) > > > > >>> > > > > >>> SELECT * FROM > > > > >>> ML_EVALUATE(TABLE `eval_data`, `classifier_model`, > > DESCRIPTOR(f1, > > > > f2)) > > > > >>> > > > > >>> Please double check if this is implementable with the current > > stack. > > > I > > > > >>> fear the parser or validator might not like the "identifier" > > > argument? > > > > >>> > > > > >>> Make sure that also these variations are supported: > > > > >>> > > > > >>> SELECT f1, f2, label FROM > > > > >>> ML_PREDICT( > > > > >>> TABLE `my_data`, > > > > >>> my_cat.my_db.classifier_model, > > > > >>> DESCRIPTOR(f1, f2)) > > > > >>> > > > > >>> SELECT f1, f2, label FROM > > > > >>> ML_PREDICT( > > > > >>> input => TABLE `my_data`, > > > > >>> model => my_cat.my_db.classifier_model, > > > > >>> args => DESCRIPTOR(f1, f2)) > > > > >>> > > > > >>> It might be safer and more future proof to wrap a MODEL() > function > > > > >>> around it. This would be more in sync with the standard that > > actually > > > > >>> still requires to put a TABLE() around the input argument: > > > > >>> > > > > >>> ML_PREDICT(TABLE(`my_data`) PARTITIONED BY c1 ORDERED BY c1, > ....) > > > > >>> > > > > >>> So the safest option would be the long-term solution: > > > > >>> > > > > >>> SELECT f1, f2, label FROM > > > > >>> ML_PREDICT( > > > > >>> input => TABLE(my_data), > > > > >>> model => MODEL(my_cat.my_db.classifier_model), > > > > >>> args => DESCRIPTOR(f1, f2)) > > > > >>> > > > > >>> But I'm fine with this if others have a strong opinion: > > > > >>> > > > > >>> SELECT f1, f2, label FROM > > > > >>> ML_PREDICT( > > > > >>> input => TABLE `my_data`, > > > > >>> model => my_cat.my_db.classifier_model, > > > > >>> args => DESCRIPTOR(f1, f2)) > > > > >>> > > > > >>> Some feedback for the remainder of the FLIP: > > > > >>> > > > > >>> 1) Simplify catalog objects > > > > >>> > > > > >>> I would suggest to drop: > > > > >>> CatalogModel.getModelKind() > > > > >>> CatalogModel.getModelTask() > > > > >>> > > > > >>> A catalog object should fully resemble the DDL. And since the DDL > > > puts > > > > >>> those properties in the WITH clause, the catalog object should > the > > > same > > > > >>> (i.e. put them into the `getModelOptions()`). Btw renaming this > > > method > > > > >>> to just `getOptions()` for consistency should be good as well. > > > > >>> Internally, we can still provide enums for these frequently used > > > > >>> classes. Similar to what we do in `FactoryUtil` for other > > frequently > > > > >>> used options. > > > > >>> > > > > >>> Remove `getDescription()` and `getDetailedDescription()`. They > > were a > > > > >>> mistake for CatalogTable and should actually be deprecated. They > > got > > > > >>> replaced by `getComment()` which is sufficient. > > > > >>> > > > > >>> 2) CREATE TEMPORARY MODEL is not supported. > > > > >>> > > > > >>> This is an unnecessary restriction. We should support temporary > > > > versions > > > > >>> of these catalog objects as well for consistency. Adding support > > for > > > > >>> this should be straightforward. > > > > >>> > > > > >>> 3) DESCRIBE | DESC } MODEL > > [catalog_name.][database_name.]model_name > > > > >>> > > > > >>> I would suggest we support `SHOW CREATE MODEL` instead. Similar > to > > > > `SHOW > > > > >>> CREATE TABLE`, this should show all properties. If we support > > > `DESCRIBE > > > > >>> MODEL` it should only list the input parameters similar to > > `DESCRIBE > > > > >>> TABLE` only shows the columns (not the WITH clause). > > > > >>> > > > > >>> Regards, > > > > >>> Timo > > > > >>> > > > > >>> > > > > >>> On 23.03.24 13:17, Ahmed Hamdy wrote: > > > > >>>> Hi everyone, > > > > >>>> +1 for this proposal, I believe it is very useful to the > minimum, > > It > > > > >>> would > > > > >>>> be great even having "ML_PREDICT" and "ML_EVALUATE" as built-in > > > PTFs > > > > in > > > > >>>> this FLIP as discussed. > > > > >>>> IIUC this will be included in the 1.20 roadmap? > > > > >>>> Best Regards > > > > >>>> Ahmed Hamdy > > > > >>>> > > > > >>>> > > > > >>>> On Fri, 22 Mar 2024 at 23:54, Hao Li <h...@confluent.io.invalid> > > > > wrote: > > > > >>>> > > > > >>>>> Hi Timo and Jark, > > > > >>>>> > > > > >>>>> I agree Oracle's syntax seems concise and more descriptive. For > > the > > > > >>>>> built-in `ML_PREDICT` and `ML_EVALUATE` functions I agree with > > Jark > > > > we > > > > >>> can > > > > >>>>> support them as built-in PTF using `SqlTableFunction` for this > > > FLIP. > > > > >>> We can > > > > >>>>> have a different FLIP discussing user defined PTF and adopt > that > > > > later > > > > >>> for > > > > >>>>> model functions later. To summarize, the current proposed > syntax > > is > > > > >>>>> > > > > >>>>> SELECT f1, f2, label FROM TABLE(ML_PREDICT(TABLE `my_data`, > > > > >>>>> `classifier_model`, f1, f2)) > > > > >>>>> > > > > >>>>> SELECT * FROM TABLE(ML_EVALUATE(TABLE `eval_data`, > > > > `classifier_model`, > > > > >>> f1, > > > > >>>>> f2)) > > > > >>>>> > > > > >>>>> Is `DESCRIPTOR` a must in the syntax? If so, it becomes > > > > >>>>> > > > > >>>>> SELECT f1, f2, label FROM TABLE(ML_PREDICT(TABLE `my_data`, > > > > >>>>> `classifier_model`, DESCRIPTOR(f1), DESCRIPTOR(f2))) > > > > >>>>> > > > > >>>>> SELECT * FROM TABLE(ML_EVALUATE(TABLE `eval_data`, > > > > `classifier_model`, > > > > >>>>> DESCRIPTOR(f1), DESCRIPTOR(f2))) > > > > >>>>> > > > > >>>>> If Calcite supports dropping outer table keyword, it becomes > > > > >>>>> > > > > >>>>> SELECT f1, f2, label FROM ML_PREDICT(TABLE `my_data`, > > > > >>> `classifier_model`, > > > > >>>>> DESCRIPTOR(f1), DESCRIPTOR(f2)) > > > > >>>>> > > > > >>>>> SELECT * FROM ML_EVALUATE(TABLE `eval_data`, > `classifier_model`, > > > > >>>>> DESCRIPTOR( > > > > >>>>> f1), DESCRIPTOR(f2)) > > > > >>>>> > > > > >>>>> Thanks, > > > > >>>>> Hao > > > > >>>>> > > > > >>>>> > > > > >>>>> > > > > >>>>> On Fri, Mar 22, 2024 at 9:16 AM Jark Wu <imj...@gmail.com> > > wrote: > > > > >>>>> > > > > >>>>>> Sorry, I mean we can bump the Calcite version if needed in > Flink > > > > 1.20. > > > > >>>>>> > > > > >>>>>> On Fri, 22 Mar 2024 at 22:19, Jark Wu <imj...@gmail.com> > wrote: > > > > >>>>>> > > > > >>>>>>> Hi Timo, > > > > >>>>>>> > > > > >>>>>>> Introducing user-defined PTF is very useful in Flink, I'm +1 > > for > > > > >>> this. > > > > >>>>>>> But I think the ML model FLIP is not blocked by this, because > > we > > > > >>>>>>> can introduce ML_PREDICT and ML_EVALUATE as built-in PTFs > > > > >>>>>>> just like TUMBLE/HOP. And support user-defined ML functions > as > > > > >>>>>>> a future FLIP. > > > > >>>>>>> > > > > >>>>>>> Regarding the simplified PTF syntax which reduces the outer > > > TABLE() > > > > >>>>>>> keyword, > > > > >>>>>>> it seems it was just supported[1] by the Calcite community > last > > > > month > > > > >>>>> and > > > > >>>>>>> will be > > > > >>>>>>> released in the next version (v1.37). The Calcite community > is > > > > >>>>> preparing > > > > >>>>>>> the > > > > >>>>>>> 1.37 release, so we can bump the version if needed in Flink > > 1.19. > > > > >>>>>>> > > > > >>>>>>> Best, > > > > >>>>>>> Jark > > > > >>>>>>> > > > > >>>>>>> [1]: https://issues.apache.org/jira/browse/CALCITE-6254 > > > > >>>>>>> > > > > >>>>>>> On Fri, 22 Mar 2024 at 21:46, Timo Walther < > twal...@apache.org > > > > > > > >>> wrote: > > > > >>>>>>> > > > > >>>>>>>> Hi everyone, > > > > >>>>>>>> > > > > >>>>>>>> this is a very important change to the Flink SQL syntax but > we > > > > can't > > > > >>>>>>>> wait until the SQL standard is ready for this. So I'm +1 on > > > > >>>>> introducing > > > > >>>>>>>> the MODEL concept as a first class citizen in Flink. > > > > >>>>>>>> > > > > >>>>>>>> For your information: Over the past months I have already > > spent > > > a > > > > >>>>>>>> significant amount of time thinking about how we can > introduce > > > > PTFs > > > > >>> in > > > > >>>>>>>> Flink. I reserved FLIP-440[1] for this purpose and I will > > share > > > a > > > > >>>>>>>> version of this in the next 1-2 weeks. > > > > >>>>>>>> > > > > >>>>>>>> For a good implementation of FLIP-440 and also FLIP-437, we > > > should > > > > >>>>>>>> evolve the PTF syntax in collaboration with Apache Calcite. > > > > >>>>>>>> > > > > >>>>>>>> There are different syntax versions out there: > > > > >>>>>>>> > > > > >>>>>>>> 1) Flink > > > > >>>>>>>> > > > > >>>>>>>> SELECT * FROM > > > > >>>>>>>> TABLE(TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL > > '10' > > > > >>>>> MINUTES)); > > > > >>>>>>>> > > > > >>>>>>>> 2) SQL standard > > > > >>>>>>>> > > > > >>>>>>>> SELECT * FROM > > > > >>>>>>>> TABLE(TUMBLE(TABLE(Bid), DESCRIPTOR(bidtime), INTERVAL > > '10' > > > > >>>>>> MINUTES)); > > > > >>>>>>>> > > > > >>>>>>>> 3) Oracle > > > > >>>>>>>> > > > > >>>>>>>> SELECT * FROM > > > > >>>>>>>> TUMBLE(Bid, COLUMNS(bidtime), INTERVAL '10' MINUTES)); > > > > >>>>>>>> > > > > >>>>>>>> As you can see above, Flink does not follow the standard > > > correctly > > > > >>> as > > > > >>>>> it > > > > >>>>>>>> would need to use `TABLE()` but this is not provided by > > Calcite > > > > yet. > > > > >>>>>>>> > > > > >>>>>>>> I really like the Oracle syntax[2][3] a lot. It reduces > > > necessary > > > > >>>>>>>> keywords to a minimum. Personally, I would like to discuss > > this > > > > >>> syntax > > > > >>>>>>>> in a separate FLIP and hope I will find supporters for: > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> SELECT * FROM > > > > >>>>>>>> TUMBLE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL '10' > > > > MINUTES); > > > > >>>>>>>> > > > > >>>>>>>> If we go entirely with the Oracle syntax, as you can see in > > the > > > > >>>>> example, > > > > >>>>>>>> Oracle allows for passing identifiers directly. This would > > solve > > > > our > > > > >>>>>>>> problems for the MODEL as well: > > > > >>>>>>>> > > > > >>>>>>>> SELECT f1, f2, label FROM ML_PREDICT( > > > > >>>>>>>> data => `my_data`, > > > > >>>>>>>> model => `classifier_model`, > > > > >>>>>>>> input => DESCRIPTOR(f1, f2)); > > > > >>>>>>>> > > > > >>>>>>>> Or we completely adopt the Oracle syntax: > > > > >>>>>>>> > > > > >>>>>>>> SELECT f1, f2, label FROM ML_PREDICT( > > > > >>>>>>>> data => `my_data`, > > > > >>>>>>>> model => `classifier_model`, > > > > >>>>>>>> input => COLUMNS(f1, f2)); > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> What do you think? > > > > >>>>>>>> > > > > >>>>>>>> Happy to create a FLIP for just this syntax question and > > > > collaborate > > > > >>>>>>>> with the Calcite community on this. Supporting the syntax of > > > > Oracle > > > > >>>>>>>> shouldn't be too hard to convince at least as parser > > parameter. > > > > >>>>>>>> > > > > >>>>>>>> Regards, > > > > >>>>>>>> Timo > > > > >>>>>>>> > > > > >>>>>>>> [1] > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/%5BWIP%5D+FLIP-440%3A+User-defined+Polymorphic+Table+Functions > > > > >>>>>>>> [2] > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://docs.oracle.com/en/database/oracle/oracle-database/19/arpls/DBMS_TF.html#GUID-0F66E239-DE77-4C0E-AC76-D5B632AB8072 > > > > >>>>>>>> [3] > > > > >>>>>> > > > > https://oracle-base.com/articles/18c/polymorphic-table-functions-18c > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> On 20.03.24 17:22, Mingge Deng wrote: > > > > >>>>>>>>> Thanks Jark for all the insightful comments. > > > > >>>>>>>>> > > > > >>>>>>>>> We have updated the proposal per our offline discussions: > > > > >>>>>>>>> 1. Model will be treated as a new relation in FlinkSQL. > > > > >>>>>>>>> 2. Include the common ML predict and evaluate functions > into > > > the > > > > >>>>> open > > > > >>>>>>>>> source flink to complete the user journey. > > > > >>>>>>>>> And we should be able to extend the calcite > > > > SqlTableFunction > > > > >>> to > > > > >>>>>>>> support > > > > >>>>>>>>> these two ML functions. > > > > >>>>>>>>> > > > > >>>>>>>>> Best, > > > > >>>>>>>>> Mingge > > > > >>>>>>>>> > > > > >>>>>>>>> On Mon, Mar 18, 2024 at 7:05 PM Jark Wu <imj...@gmail.com> > > > > wrote: > > > > >>>>>>>>> > > > > >>>>>>>>>> Hi Hao, > > > > >>>>>>>>>> > > > > >>>>>>>>>>> I meant how the table name > > > > >>>>>>>>>> in window TVF gets translated to `SqlCallingBinding`. > > Probably > > > > we > > > > >>>>>> need > > > > >>>>>>>> to > > > > >>>>>>>>>> fetch the table definition from the catalog somewhere. Do > we > > > > treat > > > > >>>>>>>> those > > > > >>>>>>>>>> window TVF specially in parser/planner so that catalog is > > > looked > > > > >>> up > > > > >>>>>>>> when > > > > >>>>>>>>>> they are seen? > > > > >>>>>>>>>> > > > > >>>>>>>>>> The table names are resolved and validated by Calcite > > > > >>> SqlValidator. > > > > >>>>>> We > > > > >>>>>>>>>> don' need to fetch from catalog manually. > > > > >>>>>>>>>> The specific checking logic of cumulate window happens in > > > > >>>>>>>>>> > > > SqlCumulateTableFunction.OperandMetadataImpl#checkOperandTypes. > > > > >>>>>>>>>> The return type of SqlCumulateTableFunction is defined in > > > > >>>>>>>>>> #getRowTypeInference() method. > > > > >>>>>>>>>> Both are public interfaces provided by Calcite and it > seems > > > it's > > > > >>>>> not > > > > >>>>>>>>>> specially handled in parser/planner. > > > > >>>>>>>>>> > > > > >>>>>>>>>> I didn't try that, but my gut feeling is that the > framework > > is > > > > >>>>> ready > > > > >>>>>> to > > > > >>>>>>>>>> extend a customized TVF. > > > > >>>>>>>>>> > > > > >>>>>>>>>>> For what model is, I'm wondering if it has to be datatype > > or > > > > >>>>>> relation. > > > > >>>>>>>>>> Can > > > > >>>>>>>>>> it be another kind of citizen parallel to > > > > >>>>>>>> datatype/relation/function/db? > > > > >>>>>>>>>> Redshift also supports `show models` operation, so it > seems > > > it's > > > > >>>>>>>> treated > > > > >>>>>>>>>> specially as well? > > > > >>>>>>>>>> > > > > >>>>>>>>>> If it is an entity only used in catalog scope (e.g., show > > xxx, > > > > >>>>> create > > > > >>>>>>>> xxx, > > > > >>>>>>>>>> drop xxx), it is fine to introduce it. > > > > >>>>>>>>>> We have introduced such one before, called Module: "load > > > > module", > > > > >>>>>> "show > > > > >>>>>>>>>> modules" [1]. > > > > >>>>>>>>>> But if we want to use Model in TVF parameters, it means it > > has > > > > to > > > > >>>>> be > > > > >>>>>> a > > > > >>>>>>>>>> relation or datatype, because > > > > >>>>>>>>>> that is what it only accepts now. > > > > >>>>>>>>>> > > > > >>>>>>>>>> Thanks for sharing the reason of preferring TVF instead of > > > > >>> Redshift > > > > >>>>>>>> way. It > > > > >>>>>>>>>> sounds reasonable to me. > > > > >>>>>>>>>> > > > > >>>>>>>>>> Best, > > > > >>>>>>>>>> Jark > > > > >>>>>>>>>> > > > > >>>>>>>>>> [1]: > > > > >>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/modules/ > > > > >>>>>>>>>> > > > > >>>>>>>>>> On Fri, 15 Mar 2024 at 13:41, Hao Li > > <h...@confluent.io.invalid > > > > > > > > >>>>>> wrote: > > > > >>>>>>>>>> > > > > >>>>>>>>>>> Hi Jark, > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Thanks for the pointer. Sorry for the confusion: I meant > > how > > > > the > > > > >>>>>> table > > > > >>>>>>>>>> name > > > > >>>>>>>>>>> in window TVF gets translated to `SqlCallingBinding`. > > > Probably > > > > we > > > > >>>>>>>> need to > > > > >>>>>>>>>>> fetch the table definition from the catalog somewhere. Do > > we > > > > >>> treat > > > > >>>>>>>> those > > > > >>>>>>>>>>> window TVF specially in parser/planner so that catalog is > > > > looked > > > > >>>>> up > > > > >>>>>>>> when > > > > >>>>>>>>>>> they are seen? > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> For what model is, I'm wondering if it has to be datatype > > or > > > > >>>>>> relation. > > > > >>>>>>>>>> Can > > > > >>>>>>>>>>> it be another kind of citizen parallel to > > > > >>>>>>>> datatype/relation/function/db? > > > > >>>>>>>>>>> Redshift also supports `show models` operation, so it > seems > > > > it's > > > > >>>>>>>> treated > > > > >>>>>>>>>>> specially as well? The reasons I don't like Redshift's > > syntax > > > > >>> are: > > > > >>>>>>>>>>> 1. It's a bit verbose, you need to think of a model name > as > > > > well > > > > >>>>> as > > > > >>>>>> a > > > > >>>>>>>>>>> function name and the function name also needs to be > > unique. > > > > >>>>>>>>>>> 2. More importantly, prediction function isn't the only > > > > function > > > > >>>>>> that > > > > >>>>>>>> can > > > > >>>>>>>>>>> operate on models. There could be a set of inference > > > functions > > > > >>> [1] > > > > >>>>>> and > > > > >>>>>>>>>>> evaluation functions [2] which can operate on models. > It's > > > hard > > > > >>> to > > > > >>>>>>>>>> specify > > > > >>>>>>>>>>> all of them in model creation. > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> [1]: > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-predict > > > > >>>>>>>>>>> [2]: > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> Thanks, > > > > >>>>>>>>>>> Hao > > > > >>>>>>>>>>> > > > > >>>>>>>>>>> On Thu, Mar 14, 2024 at 8:18 PM Jark Wu < > imj...@gmail.com> > > > > >>> wrote: > > > > >>>>>>>>>>> > > > > >>>>>>>>>>>> Hi Hao, > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>>> Can you send me some pointers > > > > >>>>>>>>>>>> where the function gets the table information? > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> Here is the code of cumulate window type checking [1]. > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>>> Also is it possible to support <query_stmt> in > > > > >>>>>>>>>>>> window functions in addiction to table? > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> Yes. It is not allowed in TVF. > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> Thanks for the syntax links of other systems. The > reason I > > > > >>> prefer > > > > >>>>>> the > > > > >>>>>>>>>>>> Redshift way is > > > > >>>>>>>>>>>> that it avoids introducing Model as a relation or > datatype > > > > >>>>>>>> (referenced > > > > >>>>>>>>>>> as a > > > > >>>>>>>>>>>> parameter in TVF). > > > > >>>>>>>>>>>> Model is not a relation because it can be queried > directly > > > > >>> (e.g., > > > > >>>>>>>>>> SELECT > > > > >>>>>>>>>>> * > > > > >>>>>>>>>>>> FROM model). > > > > >>>>>>>>>>>> I'm also confused about making Model as a datatype, > > because > > > I > > > > >>>>> don't > > > > >>>>>>>>>> know > > > > >>>>>>>>>>>> what class the > > > > >>>>>>>>>>>> model parameter of the eval method of > > > > >>>>> TableFunction/ScalarFunction > > > > >>>>>>>>>> should > > > > >>>>>>>>>>>> be. By defining > > > > >>>>>>>>>>>> the function with the model, users can directly invoke > the > > > > >>>>> function > > > > >>>>>>>>>>> without > > > > >>>>>>>>>>>> reference to the model name. > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> Best, > > > > >>>>>>>>>>>> Jark > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> [1]: > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://github.com/apache/flink/blob/d6c7eee8243b4fe3e593698f250643534dc79cb5/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/functions/sql/SqlCumulateTableFunction.java#L53 > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>> On Fri, 15 Mar 2024 at 02:48, Hao Li > > > <h...@confluent.io.invalid > > > > > > > > > >>>>>>>> wrote: > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>>>> Hi Jark, > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> Thanks for the pointers. It's very helpful. > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> 1. Looks like `tumble`, `hopping` are keywords in > calcite > > > > >>>>> parser. > > > > >>>>>>>> And > > > > >>>>>>>>>>> the > > > > >>>>>>>>>>>>> syntax `cumulate(Table my_table, ...)` needs to get > table > > > > >>>>>>>> information > > > > >>>>>>>>>>>> from > > > > >>>>>>>>>>>>> catalog somewhere for type validation etc. Can you send > > me > > > > some > > > > >>>>>>>>>>> pointers > > > > >>>>>>>>>>>>> where the function gets the table information? > > > > >>>>>>>>>>>>> 2. The ideal syntax for model function I think would be > > > > >>>>>>>>>>> `ML_PREDICT(MODEL > > > > >>>>>>>>>>>>> <model_name>, {table <table_name> | (query_stmt) })`. I > > > think > > > > >>>>> with > > > > >>>>>>>>>>>> special > > > > >>>>>>>>>>>>> handling of the `ML_PREDICT` function in > parser/planner, > > > > maybe > > > > >>>>> we > > > > >>>>>>>> can > > > > >>>>>>>>>>> do > > > > >>>>>>>>>>>>> this like window functions. But to support `MODEL` > > keyword, > > > > we > > > > >>>>>> need > > > > >>>>>>>>>>>> calcite > > > > >>>>>>>>>>>>> parser change I guess. Also is it possible to support > > > > >>>>> <query_stmt> > > > > >>>>>>>> in > > > > >>>>>>>>>>>>> window functions in addiction to table? > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> For the redshift syntax, I'm not sure the purpose of > > > defining > > > > >>>>> the > > > > >>>>>>>>>>>> function > > > > >>>>>>>>>>>>> name with the model. Is it to define the function > > > > input/output > > > > >>>>>>>>>> schema? > > > > >>>>>>>>>>> We > > > > >>>>>>>>>>>>> have the schema in our create model syntax and the > > > > `ML_PREDICT` > > > > >>>>>> can > > > > >>>>>>>>>>>> handle > > > > >>>>>>>>>>>>> it by getting model definition. I think our syntax is > > more > > > > >>>>> concise > > > > >>>>>>>> to > > > > >>>>>>>>>>>> have > > > > >>>>>>>>>>>>> a generic prediction function. I also did some research > > and > > > > >>> it's > > > > >>>>>> the > > > > >>>>>>>>>>>> syntax > > > > >>>>>>>>>>>>> used by Databricks `ai_query` [1], Snowflake `predict` > > [2], > > > > >>>>>> Azureml > > > > >>>>>>>>>>>>> `predict` [3]. > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> [1]: > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://docs.databricks.com/en/sql/language-manual/functions/ai_query.html > > > > >>>>>>>>>>>>> [2]: > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://github.com/Snowflake-Labs/sfguide-intro-to-machine-learning-with-snowpark-ml-for-python/blob/main/3_snowpark_ml_model_training_inference.ipynb?_fsi=sksXUwQ0 > > > > >>>>>>>>>>>>> [3]: > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://learn.microsoft.com/en-us/sql/machine-learning/tutorials/quickstart-python-train-score-model?view=azuresqldb-mi-current > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> Thanks, > > > > >>>>>>>>>>>>> Hao > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>> On Wed, Mar 13, 2024 at 8:57 PM Jark Wu < > > imj...@gmail.com> > > > > >>>>> wrote: > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Hi Mingge, Hao, > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Thanks for your replies. > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> PTF is actually the ideal approach for model > functions, > > > and > > > > >>> we > > > > >>>>>> do > > > > >>>>>>>>>>>> have > > > > >>>>>>>>>>>>>> the plans to use PTF for > > > > >>>>>>>>>>>>>> all model functions (including prediction, evaluation > > > etc..) > > > > >>>>> once > > > > >>>>>>>>>> the > > > > >>>>>>>>>>>> PTF > > > > >>>>>>>>>>>>>> is supported in FlinkSQL > > > > >>>>>>>>>>>>>> confluent extension. > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> It sounds that PTF is the ideal way and table function > > is > > > a > > > > >>>>>>>>>> temporary > > > > >>>>>>>>>>>>>> solution which will be dropped in the future. > > > > >>>>>>>>>>>>>> I'm not sure whether we can implement it using PTF in > > > Flink > > > > >>>>> SQL. > > > > >>>>>>>>>> But > > > > >>>>>>>>>>> we > > > > >>>>>>>>>>>>>> have implemented window > > > > >>>>>>>>>>>>>> functions using PTF[1]. And introduced a new window > > > function > > > > >>>>>>>>>> (called > > > > >>>>>>>>>>>>>> CUMULATE[2]) in Flink SQL based > > > > >>>>>>>>>>>>>> on this. I think it might work to use PTF and > implement > > > > model > > > > >>>>>>>>>>> function > > > > >>>>>>>>>>>>>> syntax like this: > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> SELECT * FROM TABLE(ML_PREDICT( > > > > >>>>>>>>>>>>>> TABLE my_table, > > > > >>>>>>>>>>>>>> my_model, > > > > >>>>>>>>>>>>>> col1, > > > > >>>>>>>>>>>>>> col2 > > > > >>>>>>>>>>>>>> )); > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Besides, did you consider following the way of AWS > > > Redshift > > > > >>>>> which > > > > >>>>>>>>>>>> defines > > > > >>>>>>>>>>>>>> model function with the model itself together? > > > > >>>>>>>>>>>>>> IIUC, a model is a black-box which defines input > > > parameters > > > > >>> and > > > > >>>>>>>>>>> output > > > > >>>>>>>>>>>>>> parameters which can be modeled into functions. > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> Best, > > > > >>>>>>>>>>>>>> Jark > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> [1]: > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/window-tvf/#session > > > > >>>>>>>>>>>>>> [2]: > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-CumulatingWindows > > > > >>>>>>>>>>>>>> [3]: > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://github.com/aws-samples/amazon-redshift-ml-getting-started/blob/main/use-cases/bring-your-own-model-remote-inference/README.md#create-model > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> On Wed, 13 Mar 2024 at 15:00, Hao Li > > > > <h...@confluent.io.invalid > > > > >>>>>> > > > > >>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> Hi Jark, > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> Thanks for your questions. These are good questions! > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> 1. The polymorphism table function I was referring to > > > > takes a > > > > >>>>>>>>>> table > > > > >>>>>>>>>>>> as > > > > >>>>>>>>>>>>>>> input and outputs a table. So the syntax would be > like > > > > >>>>>>>>>>>>>>> ``` > > > > >>>>>>>>>>>>>>> SELECT * FROM ML_PREDICT('model', (SELECT * FROM > > > my_table)) > > > > >>>>>>>>>>>>>>> ``` > > > > >>>>>>>>>>>>>>> As far as I know, this is not supported yet on Flink. > > So > > > > >>>>> before > > > > >>>>>>>>>>> it's > > > > >>>>>>>>>>>>>>> supported, one option for the predict function is > using > > > > table > > > > >>>>>>>>>>>> function > > > > >>>>>>>>>>>>>>> which can output multiple columns > > > > >>>>>>>>>>>>>>> ``` > > > > >>>>>>>>>>>>>>> SELECT * FROM my_table, LATERAL VIEW > > (ML_PREDICT('model', > > > > >>>>> col1, > > > > >>>>>>>>>>>> col2)) > > > > >>>>>>>>>>>>>>> ``` > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> 2. Good question. Type inference is hard for the > > > > `ML_PREDICT` > > > > >>>>>>>>>>>> function > > > > >>>>>>>>>>>>>>> because it takes a model name string as input. I can > > > think > > > > of > > > > >>>>>>>>>> three > > > > >>>>>>>>>>>>> ways > > > > >>>>>>>>>>>>>> of > > > > >>>>>>>>>>>>>>> doing type inference for it. > > > > >>>>>>>>>>>>>>> 1). Treat `ML_PREDICT` function as something > > > special > > > > and > > > > >>>>>>>>>> during > > > > >>>>>>>>>>>> sql > > > > >>>>>>>>>>>>>>> parsing or planning time, if it's encountered, we > need > > to > > > > >>> look > > > > >>>>>> up > > > > >>>>>>>>>>> the > > > > >>>>>>>>>>>>>> model > > > > >>>>>>>>>>>>>>> from the first argument which is a model name from > > > catalog. > > > > >>>>> Then > > > > >>>>>>>>>> we > > > > >>>>>>>>>>>> can > > > > >>>>>>>>>>>>>>> infer the input/output for the function. > > > > >>>>>>>>>>>>>>> 2). We can define a `model` keyword and use > that > > in > > > > the > > > > >>>>>>>>>> predict > > > > >>>>>>>>>>>>>> function > > > > >>>>>>>>>>>>>>> to indicate the argument refers to a model. So it's > > like > > > > >>>>>>>>>>>>>> `ML_PREDICT(model > > > > >>>>>>>>>>>>>>> 'my_model', col1, col2))` > > > > >>>>>>>>>>>>>>> 3). We can create a special type of table > > function > > > > maybe > > > > >>>>>>>>>> called > > > > >>>>>>>>>>>>>>> `ModelFunction` which can resolve the model type > > > inference > > > > by > > > > >>>>>>>>>>> special > > > > >>>>>>>>>>>>>>> handling it during parsing or planning time. > > > > >>>>>>>>>>>>>>> 1) is hacky, 2) isn't supported in Flink for > function, > > 3) > > > > >>>>> might > > > > >>>>>>>>>> be > > > > >>>>>>>>>>> a > > > > >>>>>>>>>>>>>>> good option. > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> 3. I sketched the `ML_PREDICT` function for > inference. > > > But > > > > >>>>> there > > > > >>>>>>>>>>> are > > > > >>>>>>>>>>>>>>> limitations of the function mentioned in 1 and 2. So > > > maybe > > > > we > > > > >>>>>>>>>> don't > > > > >>>>>>>>>>>>> need > > > > >>>>>>>>>>>>>> to > > > > >>>>>>>>>>>>>>> introduce them as built-in functions until > polymorphism > > > > table > > > > >>>>>>>>>>>> function > > > > >>>>>>>>>>>>>> and > > > > >>>>>>>>>>>>>>> we can properly deal with type inference. > > > > >>>>>>>>>>>>>>> After that, defining a user-defined model function > > should > > > > >>> also > > > > >>>>>> be > > > > >>>>>>>>>>>>>>> straightforward. > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> 4. For model types, do you mean 'remote', 'import', > > > > 'native' > > > > >>>>>>>>>> models > > > > >>>>>>>>>>>> or > > > > >>>>>>>>>>>>>>> other things? > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> 5. We could support popular providers such as > > 'azureml', > > > > >>>>>>>>>>> 'vertexai', > > > > >>>>>>>>>>>>>>> 'googleai' as long as we support the `ML_PREDICT` > > > function. > > > > >>>>>> Users > > > > >>>>>>>>>>>>> should > > > > >>>>>>>>>>>>>> be > > > > >>>>>>>>>>>>>>> able to implement 3rd-party providers if they can > > > > implement a > > > > >>>>>>>>>>>> function > > > > >>>>>>>>>>>>>>> handling the input/output for the provider. > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> I think for the model functions, there are still > > > > dependencies > > > > >>>>> or > > > > >>>>>>>>>>>> hacks > > > > >>>>>>>>>>>>> we > > > > >>>>>>>>>>>>>>> need to sort out as a built-in function. Maybe we can > > > > >>> separate > > > > >>>>>>>>>> that > > > > >>>>>>>>>>>> as > > > > >>>>>>>>>>>>> a > > > > >>>>>>>>>>>>>>> follow up if we want to have it built-in and focus on > > the > > > > >>>>> model > > > > >>>>>>>>>>>> syntax > > > > >>>>>>>>>>>>>> for > > > > >>>>>>>>>>>>>>> this FLIP? > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> Thanks, > > > > >>>>>>>>>>>>>>> Hao > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> On Tue, Mar 12, 2024 at 10:33 PM Jark Wu < > > > imj...@gmail.com > > > > > > > > > >>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> Hi Minge, Chris, Hao, > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> Thanks for proposing this interesting idea. I think > > this > > > > is > > > > >>> a > > > > >>>>>>>>>>> nice > > > > >>>>>>>>>>>>> step > > > > >>>>>>>>>>>>>>>> towards > > > > >>>>>>>>>>>>>>>> the AI world for Apache Flink. I don't know much > about > > > > >>> AI/ML, > > > > >>>>>>>>>> so > > > > >>>>>>>>>>> I > > > > >>>>>>>>>>>>> may > > > > >>>>>>>>>>>>>>> have > > > > >>>>>>>>>>>>>>>> some stupid questions. > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> 1. Could you tell more about why polymorphism table > > > > function > > > > >>>>>>>>>>> (PTF) > > > > >>>>>>>>>>>>>>> doesn't > > > > >>>>>>>>>>>>>>>> work and do we have plan to use PTF as model > > functions? > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> 2. What kind of object does the model map to in > SQL? A > > > > >>>>> relation > > > > >>>>>>>>>>> or > > > > >>>>>>>>>>>> a > > > > >>>>>>>>>>>>>> data > > > > >>>>>>>>>>>>>>>> type? > > > > >>>>>>>>>>>>>>>> It looks like a data type because we use it as a > > > parameter > > > > >>> of > > > > >>>>>>>>>> the > > > > >>>>>>>>>>>>> table > > > > >>>>>>>>>>>>>>>> function. > > > > >>>>>>>>>>>>>>>> If it is a data type, how does it cooperate with > type > > > > >>>>>>>>>>> inference[1]? > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> 3. What built-in model functions will we support? > How > > to > > > > >>>>>>>>>> define a > > > > >>>>>>>>>>>>>>>> user-defined model function? > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> 4. What built-in model types will we support? How to > > > > define > > > > >>> a > > > > >>>>>>>>>>>>>>> user-defined > > > > >>>>>>>>>>>>>>>> model type? > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> 5. Regarding the remote model, what providers will > we > > > > >>>>> support? > > > > >>>>>>>>>>> Can > > > > >>>>>>>>>>>>>> users > > > > >>>>>>>>>>>>>>>> implement > > > > >>>>>>>>>>>>>>>> 3rd-party providers except OpenAI? > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> Best, > > > > >>>>>>>>>>>>>>>> Jark > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> [1]: > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/functions/udfs/#type-inference > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> On Wed, 13 Mar 2024 at 05:55, Hao Li > > > > >>>>> <h...@confluent.io.invalid > > > > >>>>>>>>>>> > > > > >>>>>>>>>>>>> wrote: > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> Hi, Dev > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> Mingge, Chris and I would like to start a > discussion > > > > about > > > > >>>>>>>>>>>>> FLIP-437: > > > > >>>>>>>>>>>>>>>>> Support ML Models in Flink SQL. > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> This FLIP is proposing to support machine learning > > > models > > > > >>> in > > > > >>>>>>>>>>>> Flink > > > > >>>>>>>>>>>>>> SQL > > > > >>>>>>>>>>>>>>>>> syntax so that users can CRUD models with Flink SQL > > and > > > > use > > > > >>>>>>>>>>>> models > > > > >>>>>>>>>>>>> on > > > > >>>>>>>>>>>>>>>> Flink > > > > >>>>>>>>>>>>>>>>> to do prediction with Flink data. The FLIP also > > > proposes > > > > >>> new > > > > >>>>>>>>>>>> model > > > > >>>>>>>>>>>>>>>> entities > > > > >>>>>>>>>>>>>>>>> and changes to catalog interface to support model > > CRUD > > > > >>>>>>>>>>> operations > > > > >>>>>>>>>>>>> in > > > > >>>>>>>>>>>>>>>>> catalog. > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> For more details, see FLIP-437 [1]. Looking forward > > to > > > > your > > > > >>>>>>>>>>>>> feedback. > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> [1] > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>> > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-437%3A+Support+ML+Models+in+Flink+SQL > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>>> Thanks, > > > > >>>>>>>>>>>>>>>>> Minge, Chris & Hao > > > > >>>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>>> > > > > >>>>>>>>>>>>>> > > > > >>>>>>>>>>>>> > > > > >>>>>>>>>>>> > > > > >>>>>>>>>>> > > > > >>>>>>>>>> > > > > >>>>>>>>> > > > > >>>>>>>> > > > > >>>>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>>> > > > > >>> > > > > >>> > > > > > > > > > > > > > > > > > > >