Hi Timo, thank you for your detailed suggestions. Please see my responses below.
1) ProcTime +1 for aligning the behavior with PTF. I’ve updated the FLIP accordingly. 2) RowTime I have some concerns regarding the `ROWTIME` handling. Let me illustrate with an example. Suppose the input table schema is: `<query_col ARRAY<FLOAT>, ts TIMESTAMP(3) *ROWTIME*>` and the vector table schema is: `<id INT, search_col ARRAY<FLOAT>>` Using the following SQL: ```sql SELECT * FROM input_table, LATERAL TABLE(VECTOR_SEARCH( SEARCH_TABLE => TABLE vector_table, COLUMN_TO_SEARCH => DESCRIPTOR(search_col), COLUMN_TO_QUERY => input_table.query_col, ON_TIME => input_table.ts)) ``` The output schema becomes: ROW<query_col ARRAY<FLOAT>, ts TIMESTAMP(3), id INT, search_col ARRAY<FLOAT>, score DOUBLE, ts0 TIMESTAMP(3)> This results in two timestamp fields: ts (from input) and ts0 (generated by the operator). Having both may be confusing. Is this the intended behavior? 3) Naming I did consider SEARCH_VECTOR, but many vendors use VECTOR_SEARCH — for example, Spark[1] and BigQuery[2]. To maintain consistency and reduce the learning curve, I suggest aligning with existing industry practice. [1] https://docs.databricks.com/aws/en/sql/language-manual/functions/vector_search [2] https://cloud.google.com/bigquery/docs/vector-search-intro Best, Shengkai Timo Walther <twal...@apache.org> 于2025年8月14日周四 21:49写道: > Hi Shengkai, > > thank you for proposing this FLIP. Also, thank you for considering my > thoughts from FLIP-517, even though I haven't managed to finalize the > discussion/voting yet. > > It looks mostly good to me. However, I would like to discuss the > semantics of the `on_time` parameter: > > 1) Proctime > > I truly believe we should avoid the need for a `proctime` attribute. > Teaching the rowtime attributes to users is already painful enough, but > additionally teaching proctime is worse. For PTFs of FLIP-440, only > rowtime attributes can be used in f(on_time => ...) and we should do the > same for future built-in PTFs. Not specifying `on_time` can be equal to > proctime. > > So users can just naturally use the PTF, with the mental model of > LITERAL being a foreach loop where each invocation happens instantly (in > processing time). > > 2) Rowtime > > All PTFs should follow the SystemTypeInference: > > > https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/types/inference/SystemTypeInference.java#L239 > > It assumes that when an `on_time` parameter is passed, the result > appends a `rowtime` column that can be used in subsequent time based > operations. Can we add such a column in the output for VECTOR_SEARCH as > well? > > 3) Naming > > Just a general note, feel free to ignore: A function or operationshould > use a verb not a noun. E.g. JOIN, SEARCH, SELECT. Vector search is a > concept. The function should rather be called `SEARCH_VECTOR`. This was > also explained in FLIP-517. > > Thanks, > Timo > > > On 14.08.25 03:31, Shengkai Fang wrote: > > Hi, all. > > > > There has been no feedback for a while. I plan to close this FLIP > tomorrow > > unless there are further comments. Thank you all for the discussion. > > > > Best, > > Shengkai > > > > Yash Anand <yashanand.0...@gmail.com> 于2025年7月31日周四 15:47写道: > > > >> Hi Shengkai, > >> > >> Thanks for the FLIP, this will be a great addition to flink AI > >> capabilities. +1 for this feature. > >> > >> Best, > >> Yash Anand > >> > >> On Tue, Jul 29, 2025 at 7:23 PM Jacky Lau <liuyong...@gmail.com> wrote: > >> > >>> Hi Shengkai, > >>> > >>> Thanks for the FLIP and enhancement for AI capabilities in Flink. +1 > for > >>> this feature > >>> > >>> Best, > >>> Jacky Lau > >>> > >>> Hao Li <h...@confluent.io.invalid> 于2025年7月30日周三 01:03写道: > >>> > >>>> Hi Shengkai, > >>>> > >>>> Thanks for the FLIP and enhancement for AI capabilities in Flink. +1. > >>>> > >>>> Thanks, > >>>> Hao > >>>> > >>>> On Tue, Jul 29, 2025 at 2:16 AM Shengkai Fang <fskm...@gmail.com> > >> wrote: > >>>> > >>>>> Hi, > >>>>> I'd like to start a discussion of FLIP-540: Support VECTOR_SEARCH in > >>>> Flink > >>>>> SQL[1]. > >>>>> > >>>>> In FLIP-437/FLIP-525, Apache Flink has initially integrated Large > >>>> Language > >>>>> Model (LLM) capabilities, enabling semantic understanding and > >> real-time > >>>>> processing of streaming data pipelines. This integration has been > >>>>> technically validated in scenarios such as log classification and > >>>> real-time > >>>>> question-answering systems. However, the current architecture allows > >>>> Flink > >>>>> to only use embedding models to convert unstructured data (e.g., > >> text, > >>>>> images) into high-dimensional vector features, which are then > >> persisted > >>>> to > >>>>> downstream storage systems (e.g., Milvus, Mongodb). It lacks > >> real-time > >>>>> online querying and similarity analysis capabilities for vector > >> spaces. > >>>> To > >>>>> address this limitation, we propose introducing the VECTOR_SEARCH > >>>> function > >>>>> in this FLIP, enabling users to perform streaming vector similarity > >>>>> searches and real-time context retrieval (e.g., Retrieval-Augmented > >>>>> Generation, RAG) directly within Flink. > >>>>> > >>>>> Looking forward to comments and suggestions for improvements! > >>>>> > >>>>> Best, > >>>>> Shengkai > >>>>> > >>>>> [1] > >>>>> > >>>>> > >>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-540%3A+Support+VECTOR_SEARCH+in+Flink+SQL > >>>>> > >>>> > >>> > >> > > > >