Thanks Timo's explain. > Can you give an example for a complex temporal condition?
For example, consider the following SQL statement: SELECT * FROM src LEFT JOIN dim FOR SYSTEM_TIME AS OF PROCTIME() ON src.key = dim.key1 AND dim.key2 = 'literal_value'; In this scenario, I am uncertain how to replicate the same semantics using the SEARCH_KEY syntax. > Naming of the SEARCH_KEY I think Lincoln prefer to use LOOKUP to replace SEARCH_KEY only. :) Best Shengkai Timo Walther <twal...@apache.org> 于2025年4月17日周四 14:33写道: > Hi everyone, > > great to see that this discussion gains momentum. Let me answer your > questions below: > > @Hao > > 1. SEARCH_KEY planner representation and options > > Yes, internally nothing in the planner should change. The PTFs are just > syntactic sugar to rules and nodes that already exist. Regarding the > options, we already support passing options via lookup hints. See also > [1]. The option keys are in sync with the hints, so currently I'm not > proposing any new functionality. If there is a need, I'm sure we can > extend the existing interfaces. > > 2. Do we need to introduce a `DataStream` resource in SQL first? > > Clear no. We want to stay SQL compliant. The result remains a table > where the changeflag become an additional column in the table. All other > operations on tables should still be possible. > > @Shengkai: > > 1. Timing of Option Consumption > > The planner is able to consume these options at an early stage. We can > and should enforce literals at this location, so the map can be accessed > via type inference. Similar to CallContext#getArgumentValue(). > > 2. Column Name Conflicts in SEARCH_KEY > > The resolution logic should be similar to SystemTypeInference [2]. Thus, > conflicting columns can be named `value` and `value0`. > > 3. Details on Correlated PTFs > > I'm not suggesting a user-defined correlated PTF. Similar to how window > TVF are "built-in PTFs" (and we added user-defined PTF later), the > correlated PTFs only exist in planner. They actually already exist today > if you use a PTF and LATERAL. They will be translated into a > LogicalTableScan and LogicalCorrelate, but fail at code generation which > only support regular table functions. > > So implementation-wise correlated PTFs are just a > planner/optimizer/RelNode translation step. I will try to make this > clearer in the FLIP. > > 4. Specifying Literal Values in SEARCH_KEY Function > > Can you give an example for a complex temporal condition? > > @Lincoln: > > 1. Naming of the SEARCH_KEY > > I don't have a strong opinion on naming here. My biggest concern was > that both vector search or full text search are also a LOOKUP into a > different system. So I wanted to focus on what we are looking for, which > is KEY, VECTOR, or TEXT. We can also name it LOOKUP_KEY, LOOKUP_VECTOR, > LOOKUP_TEXT. What do you think? > > Thanks, > Timo > > [1] > > https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#lookup > [2] > > https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/types/inference/SystemTypeInference.java#L240 > > On 15.04.25 07:46, Lincoln Lee wrote: > > +1 for the flip, the snapshot, changelog related ptfs are nice addition > > to flink sql! > > > > I have a question about the naming of the SEARCH_KEY, considering > > that it is a simplified alternative to lookup join, and the word 'search' > > may be more likely to remind users of keyword searching which is a bit > > different from joins in sql, would it be better to consider following the > > naming of lookup, e.g., > > ``` > > SELECT * > > FROM > > t1, > > LATERAL LOOKUP( > > table => dim, > > key => DESCRIPTOR(k1,k2), > > t1, t2...) > > ``` > > > > > > Best, > > Lincoln Lee > > > > > > Shengkai Fang <fskm...@gmail.com> 于2025年4月15日周二 10:12写道: > > > >> Thanks for the FLIP, it helps a lot for us to develop features like > >> VECTOR_SEARCH. But I have some questions about the FLIP: > >> > >> 1. Timing of Option Consumption for SEARCH_KEY Parameters > >> For the FOR SYSTEM_TIME AS OF syntax, the planner leverages hints and > >> catalog tables to load the table scan during the toRel phase. However, > if > >> users use the SEARCH_KEY function, is the planner able to consume these > >> options at the same early stage? > >> > >> 2.Handling Column Name Conflicts in SEARCH_KEY Output Schema > >> What is the output schema behavior for the SEARCH_KEY function when the > >> left and right tables have columns with conflicting names? > >> How can users resolve these name conflicts to ensure unambiguous access > to > >> the desired columns? > >> > >> 3. Details on Correlated PTFs > >> Could you elaborate on correlated PTFs? What is the API design for > >> implementing correlated PTFs? How does a PTF retrieve the required model > >> and lookup function during execution? > >> > >> 4. Specifying Literal Values in SEARCH_KEY Function > >> How can users include literal values in the SEARCH_KEY function > parameters? > >> The FOR SYSTEM_TIME AS OF syntax allows users to define complex temporal > >> conditions. Does SEARCH_KEY support equivalent flexibility for > specifying > >> dynamic or literal values in its parameters? > >> > >> Best, > >> Shengkai > >> > >> > >> > >> Hao Li <h...@confluent.io.invalid> 于2025年4月3日周四 00:12写道: > >> > >>> Hi Timo, > >>> > >>> Any question I have is what's the SEARCH_KEY result schema you have in > >>> mind? Can it output multiple rows for every row in the left table or it > >>> needs to pack the result in a single row as an array? > >>> > >>> Thanks, > >>> Hao > >>> > >>> On Mon, Mar 24, 2025 at 10:20 AM Hao Li <h...@confluent.io> wrote: > >>> > >>>> Thanks Timo for the FLIP! This is a great improvement to the FLINK sql > >>>> syntax around tables. I have two clarification questions: > >>>> > >>>> 1. For SEARCH_KEY > >>>> ``` > >>>> SELECT * > >>>> FROM > >>>> t_other, > >>>> LATERAL SEARCH_KEY( > >>>> input => t, > >>>> on_key => DESCRIPTOR(k), > >>>> lookup => t_other.name, > >>>> options => MAP[ > >>>> 'async', 'true', > >>>> 'retry-predicate', 'lookup_miss', > >>>> 'retry-strategy', 'fixed_delay', > >>>> 'fixed-delay'='10s' > >>>> ] > >>>> ) > >>>> ``` > >>>> Table `t` needs to be an existing `LookupTableSource` [1], right? And > >> we > >>>> will rewrite it to `StreamPhysicalLookupJoin` [2] or similar operator > >>>> during the physical optimization phase. > >>>> Also to support passing options, we need to extend `LookupContext` [3] > >> to > >>>> have a `getOptions` or `getRuntimeOptions` method? > >>>> > >>>> 2. For FROM_CHANGELOG > >>>> ``` > >>>> SELECT * FROM FROM_CHANGELOG(s) AS t; > >>>> ``` > >>>> Do we need to introduce a `DataStream` resource in sql first? > >>>> > >>>> > >>>> Hao > >>>> > >>>> > >>>> > >>>> > >>>> [1] > >>>> > >>>> > >>> > >> > https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java > >>>> [2] > >>>> > >>>> > >>> > >> > https://github.com/apache/flink/blob/master/flink-table/flink-table-planner/src/main/scala/org/apache/flink/table/planner/plan/nodes/physical/stream/StreamPhysicalLookupJoin.scala#L41 > >>>> [3] > >>>> > >>>> > >>> > >> > https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/connector/source/LookupTableSource.java#L82 > >>>> > >>>> > >>>> On Fri, Mar 21, 2025 at 6:25 AM Timo Walther <twal...@apache.org> > >> wrote: > >>>> > >>>>> Hi everyone, > >>>>> > >>>>> I would like to start a discussion about FLIP-517: Better Handling of > >>>>> Dynamic Table Primitives with PTFs [1]. > >>>>> > >>>>> In the past months, I have spent a significant amount of time with > SQL > >>>>> semantics and the SQL standard around PTFs, when designing and > >>>>> implementing FLIP-440 [2]. For those of you that have not taken a > look > >>>>> into the standard, the concept of Polymorphic Table Functions (PTF) > >>>>> enables syntax for implementing custom SQL operators. In my opinion, > >>>>> they are kind of a revolution in the SQL language. PTFs can take > >> scalar > >>>>> values, tables, models (in Flink), and column lists as arguments. > With > >>>>> these primitives, we can further evolve shortcomings in the Flink SQL > >>>>> language by leveraging syntax and semantics. > >>>>> > >>>>> I would like introduce a couple of built-in PTFs with the goal to > make > >>>>> the handling of dynamic tables easier for users. Once users > understand > >>>>> how a PTF works, they can easily select from a list of functions to > >>>>> approach a table for snapshots, changelogs, or searching. > >>>>> > >>>>> The FLIP proposes: > >>>>> > >>>>> SNAPSHOT() > >>>>> SEARCH_KEY() > >>>>> TO_CHANGELOG() > >>>>> FROM_CHANGELOG() > >>>>> > >>>>> I'm aware that this is a delicate topic, and might lead to > >> controversial > >>>>> discussions. I hope with concise naming and syntax the benefit over > >> the > >>>>> existing syntax becomes clear. > >>>>> > >>>>> There are more useful PTFs to come, but those are the ones that I > >>>>> currently see as the most fundamental ones to tell a round story > >> around > >>>>> Flink SQL. > >>>>> > >>>>> Looking forward to your feedback. > >>>>> > >>>>> Thanks, > >>>>> Timo > >>>>> > >>>>> [1] > >>>>> > >>>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-517%3A+Better+Handling+of+Dynamic+Table+Primitives+with+PTFs > >>>>> [2] > >>>>> > >>> > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=298781093 > >>>>> > >>>> > >>> > >> > > > >