Re: [DISCUSS] FLIP-440: User-defined SQL operators / ProcessTableFunction (PTF)

Jim Hughes Tue, 01 Oct 2024 16:02:35 -0700

Hi Timo,

Thanks for the FLIP!  Also, thanks for saying that this has been in your
head for a few years; there is a ton here.

1. For the pass through semantics, if the partition columns are already
listed in the pass through column, they are not duplicated right?
2. A number of places mention that Calcite doesn't support XYZ.  Do we have
tickets for that work?
3. I like the scoping section.  Relative to "More than two tables is out of
scope for this FLIP.", will need to change any of the interfaces in a major
way to support multiple output tables in the future?
4. The section "Empty Semantics" under Scoping is a bit terse and I'll
admit that I don't understand it.  Could you say more in the FLIP there?
5. For Pass Through Semantics, will adding PASS THROUGH later be easy to
do?  Or is there some reason to avoid it completely?
6. nit: Under `TimedLastValue`, there is a line which is copied from the
above "The on_time descriptor takes two arguments in this case to forward
the time attributes of each side."
7. Will upsert mode be possible later?  Can you say more about why upsert
is not supported?  (I can guess, but it seems like a brief discussion in
the FLIP would be useful.)
8. In terms of the two bullets at the end migration plan, I am for both
changing the order of SESSION window colums and changing the name from
TIMECOL to on_time (or support both names?).  Is there any downside to
doing so?

Thanks,

Jim

On Mon, Sep 23, 2024 at 6:38 PM Timo Walther <twal...@apache.org> wrote:

> Hi everyone,
>
> I'm super excited to start a discussion about FLIP-440: User-defined SQL
> operators / ProcessTableFunction (PTF) [1].
>
> This FLIP has been in my head for many years and Flink 2.0 is a good
> time to open up Flink SQL to a new category of use cases. Following the
> principle "Make simple things easy, and complex ones possible", I would
> like to propose a new UDF type "ProcessTableFunction" that gives users
> access to Flink's primitives for advanced stream processing. This should
> unblock people when hitting shortcomings in Flink SQL and expand the
> scope of SQL from analytical to more event-driven applications.
>
> This proposal is by no means a full replacement of DataStream API.
> DataStream API will always provide the full power of Flink whereas PTFs
> provide at least a necessary toolbox to cover ~80% of all use cases
> without leaving the SQL ecosystem. The SQL ecosystem is a great
> foundation with well-defined type system, catalog integration, CDC
> support, and built-in functions/operators. PTFs complete it by offering
> a standard compliant extension point.
>
> Looking forward to your feedback.
>
> Thanks,
> Timo
>
> [1] https://cwiki.apache.org/confluence/x/pQnPEQ
>

Re: [DISCUSS] FLIP-440: User-defined SQL operators / ProcessTableFunction (PTF)

Reply via email to