: Hao Li
Date: Wednesday, 3 April 2024 at 18:58
To: dev@flink.apache.org
Subject: [EXTERNAL] Re: [DISCUSS] FLIP-437: Support ML Models in Flink SQL
Cross post David Radley's comments here from voting thread:
> I don’t think this counts as an objection, I have some comments. I should
have put
Cross post David Radley's comments here from voting thread:
> I don’t think this counts as an objection, I have some comments. I should
have put this on the discussion thread earlier but have just got to this.
> - I suggest we can put a model version in the model resource. Versions
are notoriously
Thanks Timo. I'll start a vote tomorrow if no further discussion.
Thanks,
Hao
On Thu, Mar 28, 2024 at 9:33 AM Timo Walther wrote:
> Hi everyone,
>
> I updated the FLIP according to this discussion.
>
> @Hao Li: Let me know if I made a mistake somewhere. I added some
> additional explaning comme
Hi everyone,
I updated the FLIP according to this discussion.
@Hao Li: Let me know if I made a mistake somewhere. I added some
additional explaning comments about the new PTF syntax.
There are no further objections from my side. If nobody objects, Hao
feel free to start the voting tomorrow.
Thanks, Hao,
Sounds good to me.
Best,
Jark
On Thu, 28 Mar 2024 at 01:02, Hao Li wrote:
> Hi Jark,
>
> I think we can start with supporting popular model providers such as
> openai, azureml, sagemaker for remote models.
>
> Thanks,
> Hao
>
> On Tue, Mar 26, 2024 at 8:15 PM Jark Wu wrote:
>
> >
Hi Jark,
I think we can start with supporting popular model providers such as
openai, azureml, sagemaker for remote models.
Thanks,
Hao
On Tue, Mar 26, 2024 at 8:15 PM Jark Wu wrote:
> Thanks for the PoC and updating,
>
> The final syntax looks good to me, at least it is a nice and concise fir
Thanks for the PoC and updating,
The final syntax looks good to me, at least it is a nice and concise first
step.
SELECT f1, f2, label FROM
ML_PREDICT(
input => `my_data`,
model => `my_cat`.`my_db`.`classifier_model`,
args => DESCRIPTOR(f1, f2));
Besides, what built-in models w
Hi Timo,
Yeah. For `primary key` and `from table(...)` those are explicitly matched
in parser: [1].
> SELECT f1, f2, label FROM
ML_PREDICT(
input => `my_data`,
model => `my_cat`.`my_db`.`classifier_model`,
args => DESCRIPTOR(f1, f2));
This named argument syntax looks good to me
Hi Hao,
> `TABLE(my_data)` and `MODEL(my_cat.my_db.classifier_model)` doesn't
> work since `TABLE` and `MODEL` are already key words
This argument doesn't count. The parser supports introducing keywords
that are still non-reserved. For example, this enables using "key" for
both primary key and
Hi Timo,
> Please double check if this is implementable with the current stack. I
fear the parser or validator might not like the "identifier" argument?
I checked this, currently the validator throws an exception trying to get
the full qualifier name for `classifier_model`. But since
`SqlValidato
Hi Ahmed,
Looks like the feature freeze time for 1.20 release is June 15th. We can
definitely get the model DDL into 1.20. For predict and evaluate functions,
if we can't get into the 1.20 release, we can get them into the 1.21
release for sure.
Thanks,
Hao
On Mon, Mar 25, 2024 at 1:25 AM Timo
Hi Jark and Hao,
thanks for the information, Jark! Great that the Calcite community
already fixed the problem for us. +1 to adopt the simplified syntax
asap. Maybe even before we upgrade Calcite (i.e. copy over classes), if
upgrading Calcite is too much work right now?
> Is `DESCRIPTOR` a mu
Hi everyone,
+1 for this proposal, I believe it is very useful to the minimum, It would
be great even having "ML_PREDICT" and "ML_EVALUATE" as built-in PTFs in
this FLIP as discussed.
IIUC this will be included in the 1.20 roadmap?
Best Regards
Ahmed Hamdy
On Fri, 22 Mar 2024 at 23:54, Hao Li w
Hi Timo and Jark,
I agree Oracle's syntax seems concise and more descriptive. For the
built-in `ML_PREDICT` and `ML_EVALUATE` functions I agree with Jark we can
support them as built-in PTF using `SqlTableFunction` for this FLIP. We can
have a different FLIP discussing user defined PTF and adopt t
Sorry, I mean we can bump the Calcite version if needed in Flink 1.20.
On Fri, 22 Mar 2024 at 22:19, Jark Wu wrote:
> Hi Timo,
>
> Introducing user-defined PTF is very useful in Flink, I'm +1 for this.
> But I think the ML model FLIP is not blocked by this, because we
> can introduce ML_PREDICT
Hi Timo,
Introducing user-defined PTF is very useful in Flink, I'm +1 for this.
But I think the ML model FLIP is not blocked by this, because we
can introduce ML_PREDICT and ML_EVALUATE as built-in PTFs
just like TUMBLE/HOP. And support user-defined ML functions as
a future FLIP.
Regarding the si
Hi everyone,
this is a very important change to the Flink SQL syntax but we can't
wait until the SQL standard is ready for this. So I'm +1 on introducing
the MODEL concept as a first class citizen in Flink.
For your information: Over the past months I have already spent a
significant amount
Thanks Jark for all the insightful comments.
We have updated the proposal per our offline discussions:
1. Model will be treated as a new relation in FlinkSQL.
2. Include the common ML predict and evaluate functions into the open
source flink to complete the user journey.
And we should be able
Hi Hao,
> I meant how the table name
in window TVF gets translated to `SqlCallingBinding`. Probably we need to
fetch the table definition from the catalog somewhere. Do we treat those
window TVF specially in parser/planner so that catalog is looked up when
they are seen?
The table names are resol
Hi Jark,
Thanks for the pointer. Sorry for the confusion: I meant how the table name
in window TVF gets translated to `SqlCallingBinding`. Probably we need to
fetch the table definition from the catalog somewhere. Do we treat those
window TVF specially in parser/planner so that catalog is looked u
Hi Hao,
> Can you send me some pointers
where the function gets the table information?
Here is the code of cumulate window type checking [1].
> Also is it possible to support in
window functions in addiction to table?
Yes. It is not allowed in TVF.
Thanks for the syntax links of other systems
Hi Jark,
Thanks for the pointers. It's very helpful.
1. Looks like `tumble`, `hopping` are keywords in calcite parser. And the
syntax `cumulate(Table my_table, ...)` needs to get table information from
catalog somewhere for type validation etc. Can you send me some pointers
where the function get
Hi Mingge, Hao,
Thanks for your replies.
> PTF is actually the ideal approach for model functions, and we do have
the plans to use PTF for
all model functions (including prediction, evaluation etc..) once the PTF
is supported in FlinkSQL
confluent extension.
It sounds that PTF is the ideal way a
Hi Jark,
Thanks for your questions. These are good questions!
1. The polymorphism table function I was referring to takes a table as
input and outputs a table. So the syntax would be like
```
SELECT * FROM ML_PREDICT('model', (SELECT * FROM my_table))
```
As far as I know, this is not supported y
Hi Minge, Chris, Hao,
Thanks for proposing this interesting idea. I think this is a nice step
towards
the AI world for Apache Flink. I don't know much about AI/ML, so I may have
some stupid questions.
1. Could you tell more about why polymorphism table function (PTF) doesn't
work and do we have p
25 matches
Mail list logo