Hi Guowei and Dian, Thanks for the detailed discussion and for raising these important architectural questions.
I’d like to connect this thread with an earlier discussion I initiated: https://lists.apache.org/thread/rn8r7pqfgncd2k3tmy9qfxs6xcslr7lf https://docs.google.com/document/d/17PpHDjFCeL2wcp9Q7t7snyvKnW2ugBKZ-rtqo_nMKQg/edit and provide more context from real-world AI application scenarios, as well as how my current work and proposed design align with Flink’s evolution. ________________________________ 1. Starting point: AI workloads in streaming systems >From current industry usage, AI in streaming systems is not an isolated domain, but an extension of existing Flink workloads. Typical Flink applications already include: real-time analytics (dashboards, monitoring) event-driven systems (fraud detection, anomaly detection) data pipelines (ETL, enrichment) In recent years, these workloads are evolving into: real-time inference (model scoring / LLM / RAG) event-driven decision making continuous feature + inference pipelines For example: fraud detection → model inference in stream monitoring → anomaly detection models BI dashboards → AI-assisted insights So from a system perspective: AI workloads are not replacing Flink use cases, but extending existing BI / streaming analytics workloads into intelligent pipelines ________________________________ 2. Current gap in Flink (from application perspective) Today, implementing inference in Flink typically relies on: AsyncFunction / async I/O custom batching logic ad-hoc retry / timeout handling external model services (e.g., Triton) Flink already provides building blocks for these: async processing state management exactly-once semantics integration with inference systems like Triton However, from an application perspective, we observe: Same pattern is repeatedly reimplemented: - batching - concurrency control - retry / timeout - backend integration And these are: not standardized not reusable not aligned with a unified operator abstraction ________________________________ 3. My contributions and what problems they address In previous contributions, I have been working on multiple aspects of this problem: Inference execution AsyncBatchWaitOperator AsyncBatchFunction (DataStream / Table API) retry / timeout strategies unified metrics 👉 solving: async inference execution batch-based inference reliability baseline ________________________________ Backend integration Triton inference integration HTTP connection pooling fallback / retry mechanisms 👉 solving: external inference system integration production-grade inference calls ________________________________ Runtime control (related work) NodeHealthManager slot filtering and isolation 👉 solving: runtime stability and resource control ________________________________ These contributions already cover most of the required capabilities: async execution batching retry / timeout backend integration But currently: they are distributed across multiple components without a unified abstraction ________________________________ 4. Design direction: from BI → AI (incremental evolution) One important point I want to emphasize is: This proposal is not introducing an “AI system” into Flink but extending Flink along its existing evolution path Flink has historically evolved as: ETL → Streaming Analytics → Real-time BI → (now) Intelligent Streaming So the direction here is: start from mature BI / SQL / streaming workloads incrementally introduce inference capabilities unify them into runtime abstractions This keeps Flink aligned with: existing users existing APIs production workloads ________________________________ 5. Relation to flink-agents I fully agree that Apache Flink Agents is an important direction. >From my understanding: flink-agents focuses on: agent orchestration LLM workflows tool / memory / reasoning loops this proposal focuses on: execution inside Flink runtime Specifically: batching at operator level concurrency control backpressure-aware scheduling checkpoint alignment So: flink-agents → orchestration layer this proposal → execution layer They are complementary rather than overlapping. ________________________________ 6. Proposed scope (based on discussion) Based on feedback, I agree that the scope should be minimal. The plan is to start with: a unified InferenceOperator batch + async execution model concurrency control basic timeout / retry And leave out: full AI runtime abstraction backend SPI advanced reliability mechanisms ________________________________ 7. Next step Based on the discussion, I plan to draft a minimal FLIP focusing on the operator abstraction. Before doing so, I’d appreciate feedback on: whether this “BI → AI incremental path” makes sense whether introducing an operator-level abstraction is appropriate whether this scope is sufficiently minimal ________________________________ 8. Closing >From my perspective, this work is about: helping Flink close the gap between existing streaming workloads and emerging AI use cases, in a way that is incremental, consistent, and production-oriented. Thanks again for the discussion — it has been very helpful. For easier understanding, I summarized the evolution path and positioning below: Best, Feat Zhang On Wed, Apr 29, 2026 at 10:05 AM Dian Fu <[email protected]> wrote: > > Hi Guowei, > > Thanks for driving this FLIP. I believe it will be an important step > towards moving Flink from the BI stack towards the AI stack. > > Here are my thoughts on the concerns raised by Gustavo: > > > - SQL/Table: What is the plan here? How will the new multimodal types > > (Tensor, Image, Embedding) work in the type system, codegen, and > > plan/savepoint compatibility? > > I think the new multimodal types will be introduced also in the > DataStream API & Table API just as the existing built-in data types in > Flink. > > > Is there a plan for SQL-level model inference > > beyond the current ML_PREDICT shape, for example, vector similarity or > > multimodal predicates? Today this is still very vendor-specific across the > > industry, so it would be nice to know if Flink wants to take a clear > > position here or how this flip will fit with the sql table vision > > We intend to introduce more built-in, domain-specific model inference > functions beyond ML_PREDICT. This is reflected in the part `FLIP-XXX: > Built-in Multimodal Operators and AI Function`. We plan to introduce > quite a few > built-in functionalities around multimodal data processing and > domain-specific model inference functions to ease the life of users. > It will be available for both Python users and SQL users. The detailed > list will be well discussed in that sub-FLIP. > > > - DataStream (v1 and v2): Will RpcOperator and the Arrow-batch primitives > > be exposed as first-class building blocks for Java users, or only as > > internal pieces behind the Python DataFrame? Many streaming inference use > > cases (real-time enrichment, CDC + model scoring) fit very well with > > DataStream and would benefit from clear guidance. > > Good question. Regarding Arrow-batch primitives, the preliminary > thinking is to focus on the Python jobs first, since it's very clear > that Python users need it. If we see clear use cases where Java users > would directly benefit from it, we can absolutely expose them as > first-class building blocks. It would be great if you could share more > thoughts on how Java users could use it in the above or other > scenarios. > > Regards, > Dian > > > On Wed, Apr 29, 2026 at 12:04 AM David Radley <[email protected]> wrote: > > > > Hi Guowei, > > This is an interesting proposal. I second Roberts questions. Some thoughts. > > > > Layer 3 does not depend on layers 1 and 2 I think. At the high level I > > wonder, is the idea that Flink could become like an R ML pipeline or SPSS? > > It would be good to compare existing technology solutions and what benefits > > Flink will bring to these scenarios. > > FLIP-XXX: Supporting RpcOperator — Independently Deployed and Scaled RPC > > Service Operators - see Robert's comment. I assume this is a specialization > > of the async io operator for RPC. When you say deploying RPC services that > > are fully managed by the Flink runtime, where would these be deployed? If > > it is remote how would this work? It would be interesting to see some use > > cases where Flink would be deploying RPC services that it has created. > > FLIP-XXX: Multimodal Data Type System and Object Reference Mechanism > > > > I like the idea of adding these types - the interesting part will be the > > deser. > > > > FLIP-XXX: A More Pythonic DataFrame API for Python Users - this makes sense > > FLIP-XXX: Connector API for Multimodal Data Source/Sink - I assume this > > will be renamed new multimodal formats. Are there existing registries that > > these could be looked in - similar to schema registry - so we can bring in > > artifacts via metadata? > > FLIP-XXX: Built-in Multimodal Operators and AI Functions - I wonder if we > > could bring in existing implementation libraries and the new work would > > allow us to call them from Flink. i.e. not having to do them one call by > > call but library by library. > > FLIP-XXX: Columnar Data Transport and Processing Optimization - this seems > > a big change, events as columns rather than events as rows or CDC > > sequences. I assume this would not be exposed in SQL? > > > > kind regards, David. > > > > From: Robert Metzger <[email protected]> > > Date: Tuesday, 28 April 2026 at 07:38 > > To: [email protected] <[email protected]> > > Subject: [EXTERNAL] Re: [DISCUSS] FLIP-577: AI-Native Flink — An Umbrella > > Proposal for Multimodal Data Processing > > > > Hey Guowei, > > > > Thanks for the proposal. I just took a brief look, here are some high level > > questions: > > > > Regarding the RPC Operator: What is the difference to the async io operator > > we have already? > > > > "Connector API for Multimodal Data Source/Sink": Why do we need to touch > > the connector API for supporting multimodal data? Isn't this more of a > > formats concern? > > > > "Non-Disruptive Scaling for CPU Operators": How do you want to guarantee > > exactly-once on that kind of scaling? E.g. you need to somehow make a > > handover between the old and new new pipeline > > > > Overall, I find the proposal has some things which seem related to making > > Flink more AI native, but other changes seem orthogonal to that. For > > example the checkpoint or scaling changes are actually unrelated to AI, and > > just engine improvements. > > > > > > On Tue, Apr 28, 2026 at 5:48 AM Guowei Ma <[email protected]> wrote: > > > > > Hi everyone, > > > > > > I'd like to start a discussion on an umbrella FLIP[1] that lays out a > > > direction for evolving Flink into a data engine that natively supports AI > > > workloads. > > > > > > The short version: user workloads are shifting from BI analytics to > > > multimodal data processing centered on model inference, and this triggers > > > cascading changes across the stack — multimodal data flowing through > > > pipelines, heterogeneous CPU/GPU resources, vectorized execution, and > > > inference tasks that run for seconds to minutes on Spot instances. The > > > proposal sketches an evolution along five directions (development > > > paradigm, > > > data model, heterogeneous resources, execution engine, fault tolerance), > > > decomposed into 11 sub-FLIPs organized into three layers: core runtime > > > primitives, AI workload expression and execution, and production-grade > > > operational guarantees. Most sub-FLIPs have no hard dependencies on each > > > other and can be advanced in parallel. > > > > > > A note on scope, since it's an umbrella: > > > > > > - In scope here: whether the evolution directions are reasonable, whether > > > each sub-FLIP's motivation and proposed approach are well-founded, and > > > whether the boundaries and dependencies between sub-FLIPs are clear. > > > - Out of scope here: detailed designs, API specifics, and implementation > > > plans of individual sub-FLIPs — those will go through their own FLIPs. > > > - Consensus criteria: agreement on the overall direction is sufficient for > > > the umbrella to pass; passing it does not lock in any sub-FLIP's design — > > > sub-FLIPs may still be adjusted, deferred, or withdrawn as they progress. > > > > > > All proposed changes are incremental — no existing API or behavior is > > > removed or altered. Compatibility details are covered at the end of the > > > document. > > > > > > Looking forward to your feedback on the overall direction and the > > > layering. > > > > > > [1] > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275 > > > > > > Thanks, > > > Guowei > > > > > > > Unless otherwise stated above: > > > > IBM United Kingdom Limited > > Registered in England and Wales with number 741598 > > Registered office: Building C, IBM Hursley Office, Hursley Park Road, > > Winchester, Hampshire SO21 2JN
