Hi,
I'd like to start a discussion of FLIP-540: Support VECTOR_SEARCH in Flink
SQL[1].

In FLIP-437/FLIP-525, Apache Flink has initially integrated Large Language
Model (LLM) capabilities, enabling semantic understanding and real-time
processing of streaming data pipelines. This integration has been
technically validated in scenarios such as log classification and real-time
question-answering systems. However, the current architecture allows Flink
to only use embedding models to convert unstructured data (e.g., text,
images) into high-dimensional vector features, which are then persisted to
downstream storage systems (e.g., Milvus, Mongodb). It lacks real-time
online querying and similarity analysis capabilities for vector spaces. To
address this limitation, we propose introducing the VECTOR_SEARCH function
in this FLIP, enabling users to perform streaming vector similarity
searches and real-time context retrieval (e.g., Retrieval-Augmented
Generation, RAG) directly within Flink.

Looking forward to comments and suggestions for improvements!

Best,
Shengkai

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-540%3A+Support+VECTOR_SEARCH+in+Flink+SQL

Reply via email to