Re: Re: [DISCUSS] FLIP-577: AI-Native Flink — An Umbrella Proposal for Multimodal Data Processing

Guowei Ma Sun, 10 May 2026 06:26:06 -0700

Hi all,

Thanks again for the thoughtful feedback and the valuable perspectives
shared in this discussion.


I have updated FLIP-577 [1] based on the discussion in this thread. The
overall direction remains the same, but I have tried to make the scope,
motivation, and boundaries clearer.

The main changes are:

   1.

   Clarified the target workload as AI-oriented data processing workloads,
   instead of relying only on the broader "AI-Native" wording.
   2.

   Added explicit non-goals to make clear that this proposal does not aim
   to turn Flink into an AI framework, ML platform, or model serving system.
   3.

   Added "Why Now" and "Why Flink" sections to better explain the
   production signals, ecosystem trends, and why Flink's existing runtime
   strengths are relevant here.
   4.

   Reworked the umbrella rationale. The key point is not that every
   sub-FLIP is AI-specific, but that these mechanisms need a shared runtime
   contract across data representation, service invocation, GPU resources,
   scaling, and recovery.
   5.

   Clarified that engine-level primitives should consider SQL/Table, Java
   DataStream, and Python DataFrame.
   6.

   Made the initial correctness scope of runtime mechanisms —
   non-disruptive scaling, UAC enhancements, and Pipeline Region checkpoints —
   more conservative, with explicit opt-in where default behavior is affected.

I also tried to reflect the earlier questions raised in this thread in the
corresponding sections of the updated document.

Looking forward to the continued discussion.

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275

Best,
Guowei


On Sun, May 10, 2026 at 4:19 PM Xintong Song <[email protected]> wrote:

> Hi all,
>
> First, big thanks to Guowei for kicking off this important discussion, and
> to the community for the substantive engagement over the past days. I'd
> also like to share some of my own thoughts.
>
> My apologies for joining late — I've been tied up with other things and
> only recently started catching up. Given the length of this thread, it's
> possible I haven't fully digested every detail. If I miss anything, please
> bear with me and feel free to point it out.
>
> Let me state my position upfront: a big +1 to this FLIP. Based on the
> workload evolution I've been observing, I believe moving Flink toward
> AI-Native is necessary — this is a project- and community-level effort that
> matters deeply for how Flink responds to the industry shift driven by AI,
> and how it captures the new opportunities that come with it. We may still
> need more discussion to align on specific technical details, but at the
> strategic level of where the community should head, I strongly support this
> umbrella.
>
> Below are my thoughts on a few key questions raised in the discussion so
> far.
>
>
> 1. On the scope of the umbrella
>
> Robert, Yaroslav, and Martijn have all pointed out that some items in the
> FLIP — Pipeline Region independent checkpointing, UAC enhancements,
> non-disruptive scaling, etc. — are not unique to AI scenarios and would
> also apply elsewhere. That observation is fair.
>
> However, I'd argue the right criterion for whether something belongs under
> this umbrella is not whether it is *exclusively* used for AI, but whether
> it is *essential* to the central goal here — AI-Native Flink — i.e.,
> whether AI-Native Flink would actually fall short without it.
>
> From that angle, I agree with what Guowei has already laid out: under AI
> workload characteristics (per-record processing cost orders of magnitude
> higher, GPU resources expensive and scarce, long and long-tailed
> asynchronous inference), these otherwise-general mechanisms get pushed to
> the limit of their current implementations. They may be adequate for
> traditional BI/ETL workloads, but they become real bottlenecks under AI
> workloads. Without them, the AI-Native Flink story would be incomplete. So
> I support including them under this umbrella.
>
>
> 2. On usage and maintenance complexity
>
> I understand the concerns Yaroslav and Martijn raised about complexity
> costs. That said, I don't think we should conservatively reject a
> capability because of complexity, if that capability is critical and brings
> substantial benefits. On the necessity of RpcOperator specifically, Zhu
> Zhu, Guowei, and others have already provided answers earlier in this
> thread, and I'll add some context from the Flink Agents side below.
>
> As a side observation: with the rapid progress of AI-assisted development
> over the past year or two, the way engineers work is shifting in a
> noticeable way. On one hand, AI coding tools are helping developers
> maintain complex software systems at lower effort. On the other hand, more
> and more users are starting to ask AI how to use Flink correctly, and even
> letting AI help them develop and operate Flink workloads. Usage and
> maintenance complexity still matter, but their relative weight is arguably
> going down — this is a trend worth factoring into long-term tradeoffs at
> the community level.
>
>
> 3. On "wait for the industry to stabilize before integrating"
>
> Yaroslav raised a concern about the timing — the LLM tooling landscape is
> changing fast, and it's hard to predict what will be needed tomorrow. I'd
> offer a different perspective.
>
> It's *precisely because* the industry is evolving fast and hasn't yet
> settled into a stable shape that Flink should join earlier — to influence
> and help define how that shape forms. If we just stand by waiting for the
> landscape to settle, by the time it does, there may not be a place for
> Flink in it, and we'd have missed an important window.
>
> In fact, projects like Daft and Ray Data are increasingly defining the de
> facto standards in this space, as Leonard pointed out earlier in this
> thread. If Flink doesn't engage proactively, even keeping up later will put
> us in a reactive position.
>
> Engaging early does carry some cost of trial and error — some capabilities
> may need to be deprecated or restructured down the line. But I'd argue this
> is a normal part of how open-source projects evolve, and it's both
> necessary and worth it.
>
>
> 4. RpcOperator from a Flink Agents perspective
>
> The benefits of RpcOperator have already been articulated by several people
> earlier in the thread. I'd like to add a concrete scenario from the Flink
> Agents side — the subagent pattern.
>
>
> 4.1 Brief intro to the subagent pattern
>
> In agent practice, after planning, the main agent often spawns one or more
> subagents to handle specific subtasks. Each subagent has its own LLM
> context — its own system prompt, its own context window, its own toolset
> and permissions. The main agent only receives the final result from the
> subagent, not the intermediate steps. This pattern is now widely adopted in
> the industry, and many agent skills explicitly say "spawn a subagent to do
> X".
>
> Flink Agents wants to support this pattern for several reasons:
> compatibility with the existing agent skill ecosystem; context isolation,
> so that intermediate state during subagent execution doesn't pollute the
> main agent's context; and one additional advantage that enterprise
> production scenarios have over single-machine personal-assistant agents —
> the ability to dispatch resource-intensive subtasks to a shared resource
> pool for efficient resource reuse.
>
>
> 4.2 Why this needs RpcOperator instead of in-place execution inside the
> Flink Agents operator
>
> Why not just spawn subagents directly inside the Flink Agents operator?
> Because subagent workloads are highly bursty — within the same job, one
> minute's input might need several subagents for deep processing; the next
> minute's input might need none. Under that kind of randomness, statically
> allocating subagent resources per operator parallelism leaves no good
> answer: under-allocate and the operator gets blown out the moment demand
> arrives; over-allocate and resources sit idle most of the time.
>
> The natural answer is to put subagent execution into a shared resource pool
> that can be used by multiple upstream operator subtasks, and that can scale
> elastically based on load. This maps directly onto the design and core
> benefits of RpcOperator that Zhu Zhu laid out earlier — each RpcOperator
> instance forming its own Pipelined Region, which in turn enables
> independent scaling and flexible load balancing.
>
>
> 4.3 Why this shared resource pool should be inside Flink rather than
> deployed externally
>
> Two reasons.
>
> First, the prompts, toolsets, and permission configurations a subagent
> needs at execution time are essentially part of the Flink Agents *job
> definition*. Stripping them out of the job and deploying them separately
> effectively splits the job definition in two, leaving users to maintain a
> synchronization relationship between an external deployment and the Flink
> job. This goes against the developer experience Flink Agents aims to
> provide.
>
> Second, subagent execution needs to be incorporated into Flink's checkpoint
> mechanism. The notion of "stateful" here deserves a brief clarification:
> each subagent task is self-contained — the full context it needs is handed
> over by the main agent at dispatch time in one shot. That is, subagents
> don't need to share state across records, so at the task level they look
> stateless. However, executing a single subagent task takes a significant
> amount of time and involves multiple rounds of model calls and tool calls,
> where tool calls may have externally observable side effects (sending
> emails, writing to external systems, etc.). This means we need failure
> recovery for the in-flight computation state during task execution, so that
> subagent execution preserves exactly-once semantics across failovers.
>
>
> To wrap up, let me reiterate my support for the overall direction of this
> FLIP. There are clearly technical details still to be worked out, but I
> believe the direction described by this umbrella is both necessary and
> important for the long-term evolution of the Flink community. Looking
> forward to continued discussion in the sub-FLIPs ahead.
>
> Thanks!
>
> Best,
>
> Xintong
>
>
>
> On Fri, May 8, 2026 at 3:43 PM Guowei Ma <[email protected]> wrote:
>
> > Hi all,
> >
> > Thanks everyone for the in-depth discussion over the past few days. Let
> me
> > first summarize my understanding of the discussion so far. I see strong
> > interest in making Flink better support AI-oriented data processing
> > workloads, especially multimodal and inference-oriented pipelines. At the
> > same time, a recurring concern is that RpcOperator and some of the Layer
> 3
> > runtime improvements also look like general Flink capabilities, so why
> > should they be discussed together under the AI-Native / multimodal
> > processing umbrella?
> >
> > I think the key question is not whether these mechanisms can only be used
> > for AI scenarios, but whether they jointly form the runtime contract
> Flink
> > needs to support AI-oriented data processing workloads, especially
> > multimodal inference pipelines. Capabilities such as RpcOperator,
> > checkpointing, scaling, and resource management can certainly serve
> broader
> > use cases. However, under this class of workloads, they require more
> > consistent runtime semantics and coordinated design, and together
> determine
> > whether the system can provide production-grade execution, recovery, and
> > elasticity.
> >
> > Compared with traditional streaming workloads, this class of workloads
> > changes the data shape, computation pattern, and resource model
> > significantly. Data is no longer only small structured records; it may be
> > images, video, audio, tensors, embeddings, or object references to
> external
> > large objects. Many pipelines also have a relatively shuffle-light shape,
> > such as URI → preprocessing → inference → sink. The computation logic
> often
> > includes model inference or service-style invocation, either as remote
> > inference service invocation or local GPU / accelerator-backed execution.
> > Therefore, the system needs to handle not only ordinary RPC calls, but
> also
> > service discovery, backpressure propagation, batching / concurrency
> > control, timeout / retry, in-flight request draining, model loading, GPU
> > warmup, resource scheduling, and fault recovery.
> >
> > From this perspective, RpcOperator is not just “another way to call an
> > external service,” nor is it merely a deployment mechanism for GPU
> > operators. More importantly, it defines a service-style operator
> > abstraction: when inference becomes part of the logical data flow, Flink
> > needs to understand and coordinate these runtime semantics, rather than
> > hiding inference completely behind external black-box calls inside user
> > code.
> >
> > Some of the Layer 3 runtime improvements follow the same logic. While
> > mechanisms such as checkpointing or scaling are not exclusive to AI
> > workloads, inference-oriented workloads fundamentally change their
> > operational assumptions and cost model, making runtime behavior far more
> > critical than in traditional data processing systems. When per-record
> > computation is expensive, GPU warmup and model loading are costly, and
> the
> > execution environment may involve elastic / preemptible resources, the
> cost
> > of global rollback or disruptive scaling becomes much higher than in
> > traditional row-at-a-time BI / ETL workloads. As a result, runtime
> behavior
> > that may have been acceptable for traditional workloads can directly
> affect
> > stability and resource efficiency in inference-oriented workloads.
> >
> > At the same time, the umbrella proposal helps provide a shared context
> for
> > discussing how these capabilities relate to each other and what common
> > runtime assumptions they rely on. The more important value of the
> umbrella
> > is to align on the workload model, design principles, boundaries, and
> > dependencies between capabilities, so that independently evolving pieces
> > such as RpcOperator, GPU resource declaration, batching / concurrency
> > control, non-disruptive scaling, and regional checkpointing do not end up
> > with inconsistent runtime semantics.
> >
> > Based on this discussion, I will update the proposal to make the workload
> > model, RpcOperator boundary, and Layer 3 dependency relationship clearer.
> >
> > Best,
> > Guowei
> > Best,
> > Guowei
> >
> >
> > On Fri, May 8, 2026 at 12:37 PM zhangjiaogg <[email protected]> wrote:
> >
> > > Hi Guowei and all,
> > >
> > > Thank you for driving this initiative. Strong +1 on the overall
> > direction.
> > >
> > > From our perspective, the core value of this proposal lies in two
> areas:
> > > extending Flink's intelligent processing capabilities for multimodal
> > data,
> > > and enabling native, in-pipeline local inference. As AI capabilities
> > > continue to advance, multimodal data is accounting for an increasingly
> > > large share of overall data volume, and the ability to perform
> > intelligent,
> > > real-time processing on this data — not just ingestion or routing, but
> > > actual inference and transformation within the stream — is becoming a
> > > critical requirement across industries. Today, most pipelines treat
> > > multimodal objects as opaque blobs and push inference to external
> > systems,
> > > which works but at the cost of complexity, latency, and consistency.
> > >
> > > This is exactly the pain one of our customers experiences in their
> > > autonomous driving data pipelines. Video and image data captured by
> > onboard
> > > cameras must go through annotation, frame extraction, quality
> filtering,
> > > and both unstructured and structured data transformation before it can
> be
> > > used for model training — a workflow that today requires combining
> > multiple
> > > specialized systems: a stream engine for structured processing, a
> > separate
> > > framework (e.g., Ray Data) for multimodal processing, and an external
> > model
> > > serving layer for inference. Each system boundary introduces
> intermediate
> > > storage, operational overhead, and data consistency challenges across
> the
> > > full pipeline. If Flink can handle this end-to-end — with native
> > multimodal
> > > types, local GPU inference operators, and unified checkpointing — the
> > > entire workflow becomes a single Flink job, intermediate storage is
> > > eliminated, and fault recovery covers the pipeline as a whole.
> > >
> > > We look forward to the sub-FLIP discussions and would be happy to
> > > contribute.
> > >
> > > Best regards
> > > Jiao Zhang
> > >
> > > At 2026-05-07 12:58:15, "zihao chen" <[email protected]> wrote:
> > > >Hi, all,
> > > >
> > > >I’d like to share some thoughts based on our internal experience
> > > >with AI workloads on Flink.
> > > >
> > > >At Tencent, we have production scenarios where Flink is used in
> > > >AI-related pipelines.
> > > >
> > > >Based on these workloads, we explored elasticity and autoscaling
> > > >for cloud-native stream processing systems and published our
> > > >experience in SIGMOD 2025:
> > > >
> > > >"Oceanus: Enable SLO-Aware Vertical Autoscaling for
> > > >Cloud-Native Streaming Services in Tencent" [1]
> > > >
> > > >As our workloads evolved, we also started to see increasing
> > > >GPU-based training and inference scenarios.
> > > >
> > > >Our current solution integrates Flink with external GPU services.
> > > >While this works functionally, it also introduces several practical
> > > >issues, such as:
> > > >
> > > >   -
> > > >
> > > >   fragmented lifecycle management
> > > >   -
> > > >
> > > >   operational complexity
> > > >   -
> > > >
> > > >   inconsistent scaling/recovery behavior across systems
> > > >
> > > >From this perspective, I think FLIP-577 is exploring a very
> > > >meaningful direction.
> > > >
> > > >In particular, I agree with the idea of integrating GPU-backed
> > > >computation more naturally into Flink’s runtime model, instead of
> > > >treating it purely as an external service integration problem.
> > > >
> > > >Besides, from the elasticity perspective, our experience is that
> > > >GPU workloads have very different characteristics compared with
> > > >traditional CPU workloads:
> > > >
> > > >   -
> > > >
> > > >   GPU resources are expensive and scarce
> > > >   -
> > > >
> > > >   Startup and replay costs are significantly higher
> > > >   -
> > > >
> > > >   Long-running tasks make scaling and recovery more challenging
> > > >
> > > >In our experience, GPU elasticity cannot simply reuse the
> > > >assumptions behind traditional CPU elasticity.
> > > >
> > > >Because of this, elasticity becomes especially important for
> > > >production AI workloads, not only for resource efficiency, but also
> > > >for reducing scaling and recovery overhead.
> > > >
> > > >More broadly, AI workloads increasingly require Flink to collaborate
> > > >more naturally with GPU-backed computation, and I believe
> > > >FLIP-577 is exploring an important direction toward addressing
> > > >this gap.
> > > >
> > > >Overall, I’m looking forward to further discussions about this FLIP.
> > > >
> > > >[1] https://dl.acm.org/doi/abs/10.1145/3722212.3724445
> > > >
> > > >
> > > >Best,
> > > >Zihao Chen
> > > >
> > > >Yong Fang <[email protected]> 于2026年5月7日周四 11:11写道：
> > > >
> > > >> Hi devs,
> > > >>
> > > >> Thanks Guowei for initiating this proposal. I think this is an
> > important
> > > >> step for Flink towards the era of AI data processing, very big +1.
> > > >>
> > > >> I’d like to share some scenarios and requirements of leveraging
> > PyFlink
> > > for
> > > >> AI data processing at ByteDance. Currently, we run tens of thousands
> > of
> > > >> pyflink/flink jobs, using millions of CPU cores.
> > > >>
> > > >> 1) Multimodal Data Processing
> > > >> We want to use PyFlink to generate multimodal feature data. The
> > typical
> > > >> workflow starts by reading ID-based raw data and performing large
> > table
> > > >> joins and ETL computations. We then fetch multimodal assets such as
> > > images,
> > > >> videos, texts and audios from object storage by ID. These multimodal
> > > data
> > > >> are either sent to an RPC service (backed by local models or remote
> > > large
> > > >> models), or processed via local GPU computing for frame extraction,
> > > >> embedding generation and other tasks. After multimodal computation,
> > > output
> > > >> results including embeddings, processed images and multimodal
> metadata
> > > are
> > > >> generated and persisted into the downstream multimodal data lake.
> > > >>
> > > >> 2) Stream-Batch Unified Data Training
> > > >> We use PyFlink to consume processed sample data from MQ or data
> lakes.
> > > >> Within the data pipeline, data may be shuffled by key, then fed into
> > > >> parameter servers or local services for CPU-based or GPU-based model
> > > >> training. Such workloads strongly demand optimized CPU & GPU hybrid
> > > >> scheduling, worker node restart capability, fast scaling, as well as
> > > native
> > > >> support for unified streaming and batch training.
> > > >>
> > > >> While supporting the above two categories of workloads, we have
> > several
> > > >> common requirements for PyFlink and Flink Core:
> > > >>
> > > >> 1) Native Python Computing Capability
> > > >> We need more user-friendly DataFrame APIs, comprehensive built-in
> data
> > > >> types for image and audio, as well as richer multimodal computing
> > > >> operators. This enables users to develop multimodal data processing
> > jobs
> > > >> more efficiently, and allows better optimization and scheduling at
> the
> > > >> execution plan layer.
> > > >>
> > > >> 2) CPU & GPU Scheduling and Resource Management
> > > >> It is expected to tag resource requirements at fine-grained levels
> > such
> > > as
> > > >> python user-defined functions. The scheduler should provide enhanced
> > > >> resource orchestration and task scheduling, enabling more flexible
> > > >> heterogeneous resource management.
> > > >>
> > > >> 3) Embedded Server Node Capability in Pipeline
> > > >> We hope to launch dedicated server nodes inside a single pipeline,
> to
> > > load
> > > >> local models or access remote models, which may be shared between
> > > different
> > > >> operators in the pipeline. This unifies data ETL and multimodal
> > > computing
> > > >> within one end-to-end pipeline, greatly simplifying operation and
> > > >> maintenance for business teams.
> > > >>
> > > >> 4) Performance and Stability Optimization
> > > >> Key enhancements include zero-copy data transfer between Flink TM
> > > processes
> > > >> and Python processes, fast scaling of compute nodes, fast
> > checkpointing
> > > >> mechanism, and shuffle optimization for large-scale datasets. These
> > > >> improvements will significantly boost the performance and stability
> of
> > > >> PyFlink multimodal workloads.
> > > >>
> > > >> I’m really excited to see that FLIP-577 has covered all the real
> > > production
> > > >> scenarios and requirements mentioned above. It's a good chance to
> > > iterate
> > > >> and enrich the core capabilities of PyFlink and Flink Core targeting
> > > these
> > > >> AI data processing scenarios, and build Flink into a first-class AI
> > data
> > > >> processing engine.
> > > >>
> > > >> I’m very much looking forward to the progress.
> > > >>
> > > >> Best,
> > > >> FangYong
> > > >>
> > > >> On Wed, May 6, 2026 at 8:36 PM FeatZhang <[email protected]>
> > wrote:
> > > >>
> > > >> > Hi all,
> > > >> >
> > > >> > This is a great topic, and honestly long overdue.
> > > >> >
> > > >> > With the rapid growth of AI applications, we have been seeing a
> > > >> > significant increase in real-world demands from users who are
> > already
> > > >> > building on Flink and other traditional data processing or BI
> > engines.
> > > >> > From a platform perspective, more and more teams are trying to
> > > >> > integrate AI capabilities directly into their existing streaming
> > > >> > pipelines, rather than treating them as separate systems.
> > > >> >
> > > >> > This is not an isolated trend — it is becoming a common
> requirement
> > > >> > across industries.
> > > >> >
> > > >> > ________________________________
> > > >> >
> > > >> > 1. What is changing in real systems
> > > >> >
> > > >> > We are observing a consistent shift:
> > > >> >
> > > >> > AI is moving from offline analysis or request-time scoring to
> > > >> > continuous, event-driven decision making.
> > > >> >
> > > >> > In other words:
> > > >> >
> > > >> > AI is becoming part of the data stream itself.
> > > >> >
> > > >> > ________________________________
> > > >> >
> > > >> > 2. Representative production scenarios
> > > >> >
> > > >> > 2.1 Real-time fraud detection (per-event decision under strict
> > > latency)
> > > >> >
> > > >> > Typical setup
> > > >> >
> > > >> > Continuous transaction stream (payments, logins, transfers)
> > > >> > Each event must be evaluated within milliseconds
> > > >> > Decision depends on:
> > > >> >
> > > >> > recent user behavior
> > > >> > device / IP patterns
> > > >> > short-term aggregates
> > > >> >
> > > >> > What is happening in practice
> > > >> >
> > > >> > Models are already deeply integrated into decision flow
> > > >> > Feature freshness directly impacts detection accuracy
> > > >> >
> > > >> > Current pain
> > > >> >
> > > >> > Features computed in streaming, but inference is remote
> > > >> > Network overhead adds to critical path latency
> > > >> > Hard to ensure training/serving consistency
> > > >> >
> > > >> > ________________________________
> > > >> >
> > > >> > 2.2 Real-time recommendation and ads (continuous re-ranking)
> > > >> >
> > > >> > Typical setup
> > > >> >
> > > >> > User interaction stream (click, view, dwell time)
> > > >> > Continuous feature updates (session + short-term behavior)
> > > >> > Inference triggered per interaction
> > > >> >
> > > >> > What is happening in practice
> > > >> >
> > > >> > Increasing reliance on real-time context
> > > >> > Model-based ranking becomes core logic
> > > >> >
> > > >> > Current pain
> > > >> >
> > > >> > Offline and online feature pipelines diverge
> > > >> > Training-serving skew is common
> > > >> > Inference orchestration is ad-hoc
> > > >> >
> > > >> > ________________________________
> > > >> >
> > > >> > 2.3 Streaming RAG / knowledge systems (continuous indexing)
> > > >> >
> > > >> > Typical setup
> > > >> >
> > > >> > Continuous ingestion of documents, logs, or knowledge
> > > >> > Pipeline:
> > > >> >
> > > >> > chunking → embedding → indexing → retrieval
> > > >> >
> > > >> > Typical use cases
> > > >> >
> > > >> > AI copilots
> > > >> > enterprise knowledge assistants
> > > >> > observability systems
> > > >> >
> > > >> > Current pain
> > > >> >
> > > >> > Built via loosely coupled services or scripts
> > > >> > No strong consistency guarantees
> > > >> > Difficult to scale and recover
> > > >> >
> > > >> > ________________________________
> > > >> >
> > > >> > 2.4 Real-time feedback loop (continuous evaluation)
> > > >> >
> > > >> > Typical setup
> > > >> >
> > > >> > Prediction at time T
> > > >> > Label arrives at T + Δ
> > > >> >
> > > >> > Required processing
> > > >> >
> > > >> > prediction stream JOIN label stream → metrics → optimization
> > > >> >
> > > >> > Current pain
> > > >> >
> > > >> > Alignment logic duplicated across systems
> > > >> > Late data handling is complex
> > > >> > No reusable evaluation abstraction
> > > >> >
> > > >> > ________________________________
> > > >> >
> > > >> > 2.5 AI replacing rule-based decision logic
> > > >> >
> > > >> > Evolution
> > > >> >
> > > >> > From:
> > > >> >
> > > >> > rule engine / CEP
> > > >> >
> > > >> > To:
> > > >> >
> > > >> > model / LLM → decision
> > > >> >
> > > >> > Implication
> > > >> >
> > > >> > AI is becoming the core decision layer inside streaming systems.
> > > >> >
> > > >> > ________________________________
> > > >> >
> > > >> > 3. Architectural shift
> > > >> >
> > > >> > Across all scenarios:
> > > >> >
> > > >> > From:
> > > >> >
> > > >> > data processing → feature system → model serving → evaluation
> > > >> >
> > > >> > To:
> > > >> >
> > > >> > stream = feature + inference + decision + feedback
> > > >> >
> > > >> > This reflects a fundamental change in system boundaries.
> > > >> >
> > > >> > ________________________________
> > > >> >
> > > >> > 4. Why externalized architectures break down
> > > >> >
> > > >> > Most current implementations rely on multiple systems:
> > > >> >
> > > >> > stream processing
> > > >> > feature store
> > > >> > model serving
> > > >> > vector database
> > > >> >
> > > >> > This introduces several fundamental issues in real-time scenarios.
> > > >> >
> > > >> > 4.1 Latency dominated by system boundaries
> > > >> >
> > > >> > stream → network → model service → response
> > > >> >
> > > >> > network overhead is unavoidable
> > > >> > batching is not controlled by the stream runtime
> > > >> > no end-to-end backpressure
> > > >> >
> > > >> > Latency becomes a system-level artifact rather than a compute
> > > property.
> > > >> >
> > > >> > ________________________________
> > > >> >
> > > >> > 4.2 Inconsistent data between training and serving
> > > >> >
> > > >> > offline vs online features
> > > >> > different definitions or time windows
> > > >> >
> > > >> > Models operate on inconsistent data distributions.
> > > >> >
> > > >> > ________________________________
> > > >> >
> > > >> > 4.3 State fragmentation
> > > >> >
> > > >> > user/session context must be rebuilt or fetched
> > > >> > loss of data locality
> > > >> > processing becomes call-driven
> > > >> >
> > > >> > ________________________________
> > > >> >
> > > >> > 4.4 Feedback loop is not composable
> > > >> >
> > > >> > difficult alignment of prediction and label streams
> > > >> > no unified handling of late data
> > > >> > duplicated evaluation logic
> > > >> >
> > > >> > ________________________________
> > > >> >
> > > >> > 4.5 Operational complexity
> > > >> >
> > > >> > multiple systems to scale
> > > >> > multiple failure domains
> > > >> > complex debugging paths
> > > >> >
> > > >> > ________________________________
> > > >> >
> > > >> > 5. Why this aligns with Flink
> > > >> >
> > > >> > These workloads require:
> > > >> >
> > > >> > event-driven execution
> > > >> > strong state management
> > > >> > precise time semantics
> > > >> > continuous feedback
> > > >> >
> > > >> > These are exactly Flink’s core strengths.
> > > >> >
> > > >> > The key insight is:
> > > >> >
> > > >> > Inference should be modeled as a dataflow operator, not an
> external
> > > >> > service.
> > > >> >
> > > >> > ________________________________
> > > >> >
> > > >> > 6. Implication
> > > >> >
> > > >> > If we model:
> > > >> >
> > > >> > data → feature → inference → decision → feedback
> > > >> >
> > > >> > within Flink, we can achieve:
> > > >> >
> > > >> > unified scheduling
> > > >> > shared state
> > > >> > consistent time semantics
> > > >> > end-to-end fault tolerance
> > > >> >
> > > >> > ________________________________
> > > >> >
> > > >> > 7. Conclusion
> > > >> >
> > > >> > This is not simply about adding AI support to Flink.
> > > >> >
> > > >> > It is about recognizing that:
> > > >> >
> > > >> > Real-time AI systems are fundamentally streaming systems.
> > > >> >
> > > >> > The question is whether Flink evolves to support this natively, or
> > > >> > remains a preprocessing layer in front of external AI stacks.
> > > >> >
> > > >> > ________________________________
> > > >> >
> > > >> > Happy to follow up with a more concrete proposal (e.g., inference
> > > >> > operator abstraction) if there is interest.
> > > >> >
> > > >> > Thanks.
> > > >> >
> > > >> > On Mon, May 4, 2026 at 10:31 PM Gen Luo <[email protected]>
> > wrote:
> > > >> > >
> > > >> > > Hi all,
> > > >> > >
> > > >> > > Thank you Guowei Ma for driving this discussion, and thanks
> > everyone
> > > >> for
> > > >> > > the valuable insights. Inspired by this exchange, I’d like to
> > share
> > > a
> > > >> few
> > > >> > > thoughts.
> > > >> > >
> > > >> > > While “AI-Native” covers broad ground, I believe this FLIP does
> > not
> > > >> > > overextend Flink’s scope. It’s a necessary iteration driven by
> > > evolving
> > > >> > > user scenarios and AI advancements, particularly multimodal
> > > processing.
> > > >> > > Given the growing adoption of multimodal applications and
> > increasing
> > > >> > > interest in low-latency inference, initiating these enhancements
> > is
> > > a
> > > >> > > timely step to better align Flink with evolving AI workloads.
> > > >> > >
> > > >> > > From our engagements with customers and developers, we observe a
> > > clear
> > > >> > > shift in both workloads and user expectations. Model inference
> is
> > > >> > > increasingly central to data pipelines, with multimodal AI tasks
> > > >> growing
> > > >> > > rapidly. Traditional real-time scenarios (e.g., monitoring and
> > > >> analytics)
> > > >> > > now leverage models and agent frameworks like Flink Agent for
> > > >> > intelligent,
> > > >> > > multi-turn decision-making, while large-scale offline compute is
> > > also
> > > >> > > shifting toward LLMs and vision models. Alongside this workload
> > > >> > evolution,
> > > >> > > developer workflows have adapted: AI practitioners naturally
> > prefer
> > > >> > Python
> > > >> > > and DataFrame-style APIs. As AI-assisted coding matures,
> aligning
> > > >> system
> > > >> > > interfaces with these familiar patterns will directly improve
> > > >> > AI-generated
> > > >> > > code quality and significantly lower adoption barriers for the
> AI
> > > >> > community.
> > > >> > >
> > > >> > > Today, many AI evaluation tools don’t yet recommend Flink for AI
> > > >> > > workloads—largely due to limited visibility of Flink’s relevant
> > > >> > > capabilities rather than fundamental incompatibility. In
> reality,
> > > Flink
> > > >> > has
> > > >> > > unique strengths here. For example, generating multimodal
> samples
> > is
> > > >> > often
> > > >> > > a multi-day, GPU-heavy process. Flink’s streaming model,
> combined
> > > with
> > > >> > > checkpointing and reduced disk I/O, is well-suited for such
> > > >> long-running
> > > >> > > tasks—a direction also pursued by engines like Daft and Ray
> Data.
> > > With
> > > >> > > Flink’s proven production stability, we’re well-positioned for
> > both
> > > >> batch
> > > >> > > and future real-time multimodal streaming inference. Targeted
> > > >> > improvements
> > > >> > > can make these advantages visible, driving better user
> experiences
> > > and
> > > >> > > healthier ecosystem growth.
> > > >> > >
> > > >> > > I’d also note a lesson from FlinkML. It attempted to cover model
> > > >> training
> > > >> > > but struggled to align with the fast-iteration,
> > > Python/notebook-centric
> > > >> > > workflows preferred by ML researchers. Flink’s core strength
> lies
> > in
> > > >> > > high-concurrency, production-grade inference orchestration—not
> > > training
> > > >> > > lifecycle management (e.g., experiment tracking, versioning).
> This
> > > >> > mismatch
> > > >> > > limited its adoption.
> > > >> > >
> > > >> > > This proposal, however, takes a different path. It doesn’t aim
> to
> > > >> replace
> > > >> > > training frameworks. Instead, it introduces modern AI concepts
> > > >> > (multimodal
> > > >> > > data, LLMs) as first-class citizens for inference, built atop
> > > Flink’s
> > > >> > > computation strengths. Think Ray Data’s scope (plus simple
> > > co-located
> > > >> > > serving), not Train/Tune. Crucially, unlike the FlinkML era,
> > today’s
> > > >> > models
> > > >> > > use standardized interfaces and mature serving frameworks,
> > allowing
> > > >> Flink
> > > >> > > to integrate external models seamlessly without heavy
> > > >> > > customization—significantly lowering project risk.
> > > >> > >
> > > >> > > This FLIP marks Flink’s another starting point for the AI era.
> > While
> > > >> > > details need refinement, I believe this direction aligns with
> both
> > > >> > current
> > > >> > > and future user needs and Flink’s evolution.
> > > >> > >
> > > >> > > Best,
> > > >> > > Gen
> > > >> > >
> > > >> > > On Mon, May 4, 2026 at 12:15 PM Jark Wu <[email protected]>
> wrote:
> > > >> > >
> > > >> > > > Hi Guowei,
> > > >> > > >
> > > >> > > > Thanks for driving this. +1 on the overall direction. Flink's
> > > >> > > > streaming processing and checkpoint mechanism give it a
> > structural
> > > >> > > > advantage over systems like Daft and Ray. But today, these
> > runtime
> > > >> > > > strengths are held back by gaps in Python API, GPU scheduling,
> > and
> > > >> > > > native multimodal data handling. This umbrella FLIP addresses
> > > exactly
> > > >> > > > that gap, comprehensively and systematically. I believe
> > multimodal
> > > >> > > > data processing is the biggest opportunity for traditional
> data
> > > infra
> > > >> > > > to transition into AI infra, and this is one of the most
> > important
> > > >> > > > FLIPs for Flink in the AI era.
> > > >> > > >
> > > >> > > > As one of the Table/SQL module maintainers, we would like to
> > > >> > > > contribute the built-in multimodal processing UDFs (audio,
> > video,
> > > >> > > > image, text) and native multimodal data types (Tensor, Image,
> > > >> > > > Embedding, etc.) as first-class citizens in the type system.
> > > Looking
> > > >> > > > forward to the sub-FLIP discussions.
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Jark
> > > >> > > >
> > > >> > > > On Thu, 30 Apr 2026 at 18:42, Guowei Ma <[email protected]
> >
> > > >> wrote:
> > > >> > > > >
> > > >> > > > > Hi,Yaroslav
> > > >> > > > >
> > > >> > > > > Thanks for taking the time to write this detailed feedback.
> > Let
> > > me
> > > >> > > > clarify
> > > >> > > > > the intent of the proposal first.
> > > >> > > > >
> > > >> > > > > I am not saying that Flink should become an AI framework, an
> > ML
> > > >> > platform,
> > > >> > > > > or a model serving system. The way I use "AI-Native" in this
> > > >> > proposal is
> > > >> > > > to
> > > >> > > > > say that Flink should support, as first-class citizens, the
> > core
> > > >> > objects
> > > >> > > > > and execution patterns that frequently show up in
> AI-oriented
> > > data
> > > >> > > > > processing — instead of leaving them entirely to external
> > > systems
> > > >> or
> > > >> > ad
> > > >> > > > hoc
> > > >> > > > > user-defined integrations.
> > > >> > > > >
> > > >> > > > > These objects and execution patterns include:
> > > >> > > > >
> > > >> > > > >    - Multimodal and unstructured data objects such as
> images,
> > > >> video,
> > > >> > > > audio,
> > > >> > > > >    tensors, embeddings, and object references.
> > > >> > > > >    - Model inference as part of the data flow, rather than
> an
> > > >> > entirely
> > > >> > > > >    external black-box service call.
> > > >> > > > >    - Operators backed by heterogeneous resources such as
> GPUs.
> > > >> > > > >    - Pythonic and vectorized processing styles.
> > > >> > > > >    - Long-running, long-tailed asynchronous computation.
> > > >> > > > >
> > > >> > > > > "AI-Native" is just a shorthand here, meaning that Flink
> > should
> > > >> > natively
> > > >> > > > > understand and support the core abstractions of this class
> of
> > > >> > workloads.
> > > >> > > > > The FLIP needs to make the target workload class clearer.
> What
> > > we
> > > >> > care
> > > >> > > > > about is not any specific model paradigm — LLM, CV,
> > > recommendation,
> > > >> > or
> > > >> > > > > traditional ML inference — but a class of data processing
> > > workloads
> > > >> > with
> > > >> > > > > shared runtime and topology characteristics:
> > > >> > > > >
> > > >> > > > >    - A single computation may take seconds or even minutes,
> > > instead
> > > >> > of
> > > >> > > > >    microseconds as in traditional row-at-a-time processing.
> > > >> > > > >    - Execution often involves heterogeneous resources such
> as
> > > CPU +
> > > >> > GPU,
> > > >> > > > >    where GPUs are expensive and scarce.
> > > >> > > > >    - Data is often multimodal large objects (images, video,
> > > audio,
> > > >> > > > tensors,
> > > >> > > > >    embeddings), rather than structured small records.
> > > >> > > > >    - Computation logic often includes model inference or
> > > >> > service-style
> > > >> > > > >    invocations as part of the pipeline.
> > > >> > > > >    - Many target topologies are relatively shuffle-light and
> > > don't
> > > >> > > > >    necessarily involve complex keyed-state migration, e.g.
> > URI →
> > > >> > > > preprocessing
> > > >> > > > >    → inference → sink.
> > > >> > > > >
> > > >> > > > > Ten years ago, many ML workloads took the form of offline
> > > training
> > > >> > plus
> > > >> > > > > online feature serving. Flink already played a strong role
> in
> > > >> feature
> > > >> > > > > engineering, streaming feature computation, and real-time
> data
> > > >> > > > preparation,
> > > >> > > > > so there was no strong need to reshape Flink into an
> > "ML-Native"
> > > >> > engine.
> > > >> > > > >
> > > >> > > > > What is changing today is that model inference itself is
> > > >> increasingly
> > > >> > > > > becoming part of the data processing pipeline; multimodal
> > > objects
> > > >> > are no
> > > >> > > > > longer just opaque blobs in external storage, but data
> objects
> > > that
> > > >> > need
> > > >> > > > to
> > > >> > > > > be referenced, passed, transformed, inferred over, and
> landed
> > > >> inside
> > > >> > the
> > > >> > > > > engine. This is not simply one more ML use case — it is a
> > > change in
> > > >> > the
> > > >> > > > > shape of workloads Flink needs to support.
> > > >> > > > >
> > > >> > > > > On whether the user demand is real, the validation signals
> we
> > > are
> > > >> > > > currently
> > > >> > > > > seeing include:
> > > >> > > > >
> > > >> > > > >    - Within Alibaba, multimodal data processing is already
> in
> > > >> > production,
> > > >> > > > >    covering image, video, audio, and text modalities.
> > > >> > > > >    - In offline conversations with several companies
> > (including
> > > >> > ByteDance
> > > >> > > > >    and Tencent), we have heard substantial demand for Flink
> to
> > > >> > support
> > > >> > > > AI data
> > > >> > > > >    processing / multimodal data processing.
> > > >> > > > >    - On the ecosystem side, we are working with NVIDIA on a
> > > joint
> > > >> > demo
> > > >> > > > >    focused on multimodal data processing, planned for Flink
> > > Forward
> > > >> > Asia.
> > > >> > > > >    - The emergence and growth of systems such as Daft, Ray
> > Data,
> > > >> > > > >    Data-Juicer, and LAS also reflect rapidly growing demand
> > for
> > > >> > > > multimodal
> > > >> > > > >    data processing.
> > > >> > > > >    - There have also been independent discussions in this
> > > direction
> > > >> > > > within
> > > >> > > > >    the community — for example, the "Streaming-native AI
> > > Inference
> > > >> > > > Runtime
> > > >> > > > >    Layer" proposal on the dev list.
> > > >> > > > >
> > > >> > > > > On "why now, instead of waiting for standardization" — I
> > > understand
> > > >> > the
> > > >> > > > > concern. LLM-related frameworks, APIs, and application-level
> > > >> > patterns are
> > > >> > > > > indeed changing quickly. If this FLIP were trying to bake a
> > > >> specific
> > > >> > LLM
> > > >> > > > > API, agent framework, or prompt protocol into Flink, the
> risk
> > > would
> > > >> > be
> > > >> > > > high.
> > > >> > > > >
> > > >> > > > > But most of the capabilities in this proposal are not
> > > LLM-specific.
> > > >> > They
> > > >> > > > > are more fundamental data processing and runtime
> capabilities:
> > > >> > Pipeline
> > > >> > > > > Region-level checkpointing, Object Reference, GPU resource
> > > >> > declaration,
> > > >> > > > > columnar data transfer, service-style operator invocation,
> > > >> > long-running
> > > >> > > > > async execution. These are useful for today's LLM workloads,
> > and
> > > >> > equally
> > > >> > > > > useful for future AI workloads in shapes we cannot fully
> > predict
> > > >> > yet. The
> > > >> > > > > fast-changing parts should live in the ecosystem and SDK
> > layer;
> > > the
> > > >> > FLIP
> > > >> > > > > should focus on more stable engine-level capabilities.
> > > >> > > > >
> > > >> > > > > On tactical changes vs. umbrella, I partly agree with you.
> > Each
> > > >> > sub-FLIP
> > > >> > > > > should be discussed, reviewed, and accepted or rejected on
> its
> > > own
> > > >> > > > merits.
> > > >> > > > > The umbrella should not bypass the normal FLIP process, and
> > > >> > accepting the
> > > >> > > > > umbrella does not mean accepting all sub-FLIPs. That said, I
> > > still
> > > >> > think
> > > >> > > > > the umbrella is valuable. Its purpose is not to bind the 11
> > > changes
> > > >> > into
> > > >> > > > a
> > > >> > > > > single inseparable package, but to help the community align
> on
> > > >> > > > principles,
> > > >> > > > > clarify boundaries and dependencies, and avoid conflicting
> or
> > > >> > duplicated
> > > >> > > > > abstractions across related capabilities.
> > > >> > > > >
> > > >> > > > > For example, if RpcOperator is not considered together with
> > > >> > > > non-disruptive
> > > >> > > > > scaling, it is hard to give GPU operator elasticity coherent
> > > >> > semantics.
> > > >> > > > > Deploying inference services independently is only the first
> > > step;
> > > >> > the
> > > >> > > > > harder question is how Flink uniformly handles service
> > > discovery,
> > > >> > > > in-flight
> > > >> > > > > request draining, backpressure, and failover during scaling.
> > > >> Without
> > > >> > an
> > > >> > > > > umbrella, these capabilities can certainly be advanced as
> > > tactical
> > > >> > > > changes,
> > > >> > > > > but we may end up with a set of abstractions that are
> locally
> > > >> usable
> > > >> > but
> > > >> > > > > globally inconsistent.
> > > >> > > > >
> > > >> > > > > On RpcOperator, I agree that we need to be very careful in
> > > defining
> > > >> > the
> > > >> > > > > boundary between the Flink runtime and external
> orchestration
> > > >> > systems.
> > > >> > > > > Kubernetes or the Kubernetes Operator may well be the right
> > > choice
> > > >> > at the
> > > >> > > > > physical deployment level. But I still believe Flink needs a
> > > >> > first-class
> > > >> > > > > RpcOperator abstraction, because deployment is only part of
> > the
> > > >> > problem —
> > > >> > > > > the harder part is its semantic integration with the Flink
> > job.
> > > >> > > > >
> > > >> > > > > If model inference is part of the logical data flow, Flink
> > > needs at
> > > >> > > > minimum
> > > >> > > > > to be aware of its service discovery, backpressure behavior,
> > > >> failover
> > > >> > > > > behavior, in-flight request draining, and scaling
> > coordination.
> > > If
> > > >> > it is
> > > >> > > > > hidden entirely behind an external black-box service, it is
> > hard
> > > >> for
> > > >> > > > Flink
> > > >> > > > > to provide consistent job-level semantics and operational
> > > >> experience.
> > > >> > > > >
> > > >> > > > > So the point of RpcOperator is not necessarily that "every
> > > physical
> > > >> > > > process
> > > >> > > > > must be directly launched and managed by Flink core," but
> that
> > > >> Flink
> > > >> > > > needs
> > > >> > > > > to define a service-style operator contract that allows such
> > > >> > operators to
> > > >> > > > > be invoked correctly by the data flow, coordinated correctly
> > by
> > > the
> > > >> > > > > runtime, and understood and operated by users as part of a
> > Flink
> > > >> job.
> > > >> > > > >
> > > >> > > > > On vectorized batch processing, I agree the long-term
> > direction
> > > >> > should
> > > >> > > > not
> > > >> > > > > stop at Python. Native columnar / vectorized execution is an
> > > >> > end-to-end
> > > >> > > > > problem that touches connectors, formats, the type system,
> > > runtime,
> > > >> > Java,
> > > >> > > > > SQL, and Python. The current proposal starts from the
> > > Java/Python
> > > >> > > > boundary
> > > >> > > > > because that is where the row/column conversion overhead is
> > most
> > > >> > visible.
> > > >> > > > > End-to-end columnar execution on the Java and SQL side
> > deserves
> > > to
> > > >> be
> > > >> > > > > discussed further as a separate, larger FLIP.
> > > >> > > > >
> > > >> > > > > On multimodal types and SerDes complexity, I agree this
> needs
> > > to be
> > > >> > > > handled
> > > >> > > > > carefully. Making AI-related objects first-class does not
> > imply
> > > >> that
> > > >> > > > every
> > > >> > > > > connector must immediately and fully support image, video,
> > > audio,
> > > >> > tensor,
> > > >> > > > > and so on. The concrete incremental path, fallback strategy,
> > and
> > > >> the
> > > >> > > > > boundary between formats, connector API, and the type system
> > > will
> > > >> be
> > > >> > > > > discussed further in the corresponding sub-FLIPs.
> > > >> > > > >
> > > >> > > > > Coming back to the core of the proposal: it is not about
> > turning
> > > >> > Flink
> > > >> > > > into
> > > >> > > > > an AI framework. It is about making the core objects and
> > > execution
> > > >> > > > patterns
> > > >> > > > > of AI-oriented data processing first-class citizens in
> Flink.
> > > >> > > > >
> > > >> > > > > Best,
> > > >> > > > > Guowei
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Thu, Apr 30, 2026 at 5:37 AM Yaroslav Tkachenko <
> > > >> > [email protected]
> > > >> > > > >
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hi Guowei,
> > > >> > > > > >
> > > >> > > > > > Thank you for writing this proposal.
> > > >> > > > > >
> > > >> > > > > > I may be in the minority here, but I hope my voice will be
> > > >> heard. I
> > > >> > > > > > disagree with turning Flink into an "AI-Native" engine.
> > > >> > > > > >
> > > >> > > > > > Regarding your "Data processing is entering the AI era,
> and
> > > Flink
> > > >> > > > needs to
> > > >> > > > > > evolve from a traditional BI compute engine into a data
> > engine
> > > >> that
> > > >> > > > > > natively supports AI workloads" claim:
> > > >> > > > > >
> > > >> > > > > > - How exactly do you define "AI"? I don't believe there
> is a
> > > >> > standard
> > > >> > > > > > definition. For example, Machine Learning have been around
> > for
> > > >> more
> > > >> > > > than a
> > > >> > > > > > decade, but there were no proposals (or need, in my
> opinion)
> > > to
> > > >> > turn
> > > >> > > > Flink
> > > >> > > > > > into an "ML-Native" engine. Flink, in its current state,
> has
> > > >> > > > > > been successfully used in many systems alongside dedicated
> > ML
> > > >> > > > technologies,
> > > >> > > > > > like feature stores. Based on the context of your
> proposal,
> > it
> > > >> > looks
> > > >> > > > like
> > > >> > > > > > you mostly mean LLMs, so could you be specific about the
> > > >> language?
> > > >> > > > > > - I wouldn't call Flink "a traditional BI compute engine".
> > > Flink
> > > >> > is a
> > > >> > > > > > general data processing technology which can be used for a
> > > >> variety
> > > >> > of
> > > >> > > > use
> > > >> > > > > > cases without any BI involvement.
> > > >> > > > > > - Do you have any proof that "Users' core workloads are
> > > rapidly
> > > >> > > > evolving"
> > > >> > > > > > and that they require your proposed changes? Case studies,
> > > user
> > > >> > > > surveys, or
> > > >> > > > > > submitted issues about the lack of support? Big changes
> like
> > > that
> > > >> > > > require
> > > >> > > > > > extensive validation.
> > > >> > > > > > - And even if there is a real need to adopt some
> LLM-driven
> > > >> > changes,
> > > >> > > > why
> > > >> > > > > > now? The LLM-related tooling has been changing so rapidly,
> > and
> > > >> it's
> > > >> > > > hard to
> > > >> > > > > > predict what will be needed tomorrow. Why does it make
> sense
> > > to
> > > >> > > > introduce
> > > >> > > > > > changes now, and not wait for more standardization and
> > > >> > consolidation?
> > > >> > > > > >
> > > >> > > > > > To summarize, I think there are a lot of great ideas in
> the
> > > >> > proposal,
> > > >> > > > but
> > > >> > > > > > in my mind, they need to be addressed as tactical, focused
> > > >> > changes, not
> > > >> > > > > > under the "AI-Native" umbrella.
> > > >> > > > > >
> > > >> > > > > > I also wanted to address a few more specific points:
> > > >> > > > > >
> > > >> > > > > > - RpcOperator, why does it need to be managed by Flink? I
> > see
> > > >> > > > absolutely no
> > > >> > > > > > need to introduce the additional complexity of
> orchestrating
> > > >> > standalone
> > > >> > > > > > components into the core Flink engine. I can imagine a
> > > separate
> > > >> > > > sub-project
> > > >> > > > > > for an RpcOperator, which could potentially be managed by
> > the
> > > >> > > > Kubernetes
> > > >> > > > > > Operator.
> > > >> > > > > > - You make the case for the vectorized batch processing,
> but
> > > only
> > > >> > on
> > > >> > > > the
> > > >> > > > > > Python side. Why stop there? Native columnar vectorized
> > > execution
> > > >> > will
> > > >> > > > > > require end-to-end changes, including connectors, data
> > format
> > > >> > support,
> > > >> > > > Type
> > > >> > > > > > system support, runtime changes, etc. It seems logical to
> me
> > > to
> > > >> > support
> > > >> > > > > > this execution mode for Java and SQL as well.
> > > >> > > > > > - Supporting many more data types natively (images, video,
> > > audio,
> > > >> > > > tensors)
> > > >> > > > > > will make connector serializers and deserializers (SerDes)
> > > much
> > > >> > more
> > > >> > > > > > challenging to implement. Even today, many SerDes in
> > > officially
> > > >> > > > supported
> > > >> > > > > > connectors don't fully implement types like arrays and
> > > structs.
> > > >> > > > > >
> > > >> > > > > > Thank you.
> > > >> > > > > >
> > > >> > > > > > On Wed, Apr 29, 2026 at 1:18 AM Guowei Ma <
> > > [email protected]>
> > > >> > > > wrote:
> > > >> > > > > >
> > > >> > > > > > > Hi Z
> > > >> > > > > > >
> > > >> > > > > > > Thanks for the kind words and the thoughtful questions.
> > Let
> > > me
> > > >> > take
> > > >> > > > them
> > > >> > > > > > > one by one.
> > > >> > > > > > >
> > > >> > > > > > >    1. Throughput and latency targets
> > > >> > > > > > >
> > > >> > > > > > > To be honest, I don't have concrete numbers to share
> yet.
> > > What
> > > >> I
> > > >> > can
> > > >> > > > say
> > > >> > > > > > is
> > > >> > > > > > > that our internal testing has already surfaced several
> > > >> directions
> > > >> > > > where
> > > >> > > > > > > Flink can be improved, and at the same time we want to
> > fully
> > > >> > leverage
> > > >> > > > > > > Flink's existing streaming shuffle capabilities. As the
> > > >> > multimodal
> > > >> > > > > > operator
> > > >> > > > > > > library matures, we'll progressively publish benchmark
> > > results.
> > > >> > > > > > >
> > > >> > > > > > >    2. Built-in operators
> > > >> > > > > > >
> > > >> > > > > > > You're absolutely right. From what I've seen, our
> internal
> > > >> users
> > > >> > > > already
> > > >> > > > > > > rely on a fairly large set of multimodal operators —
> > > >> potentially
> > > >> > > > 100+.
> > > >> > > > > > The
> > > >> > > > > > > exact set the community should provide is best discussed
> > in
> > > >> > FLIP-XXX:
> > > >> > > > > > > Built-in Multimodal Operators and AI Functions, and
> > > >> contributions
> > > >> > > > from
> > > >> > > > > > the
> > > >> > > > > > > community are very welcome there.
> > > >> > > > > > >
> > > >> > > > > > >    3. Plan for the 11 sub-FLIPs
> > > >> > > > > > >
> > > >> > > > > > > The sequencing follows the layering in the umbrella:
> > > >> > > > > > >
> > > >> > > > > > >    - Layer 1 (Core Primitives) should be discussed and
> > > aligned
> > > >> > first,
> > > >> > > > > > since
> > > >> > > > > > >    the second and third layers build on it.
> > > >> > > > > > >    - Layer 2 (API + compilation + single-node execution)
> > > starts
> > > >> > with
> > > >> > > > > > >    getting the API discussion right — the Python API,
> how
> > > UDFs
> > > >> > > > declare
> > > >> > > > > > >    resources, etc. — after which the single-node
> execution
> > > work
> > > >> > can
> > > >> > > > build
> > > >> > > > > > > on
> > > >> > > > > > >    top.
> > > >> > > > > > >    - Layer 3 (distributed scheduling and checkpointing)
> > can
> > > >> > largely
> > > >> > > > > > proceed
> > > >> > > > > > >    independently in parallel.
> > > >> > > > > > >
> > > >> > > > > > > So while each sub-FLIP is indeed a substantial piece of
> > > work,
> > > >> > most of
> > > >> > > > > > them
> > > >> > > > > > > can be advanced in parallel by different contributors
> once
> > > the
> > > >> > Layer
> > > >> > > > 1
> > > >> > > > > > > primitives are settled.
> > > >> > > > > > >
> > > >> > > > > > >    4. GPU scheduling roadmap
> > > >> > > > > > >
> > > >> > > > > > > Could you expand a bit on which aspect of GPU scheduling
> > you
> > > >> > have in
> > > >> > > > mind
> > > >> > > > > > > as the complex one? "GPU scheduling" covers a fairly
> wide
> > > >> surface
> > > >> > > > area
> > > >> > > > > > > (resource declaration, operator-level deployment,
> elastic
> > > >> > scaling,
> > > >> > > > > > > heterogeneous GPU types, fine-grained partitioning,
> etc.),
> > > and
> > > >> > the
> > > >> > > > answer
> > > >> > > > > > > differs quite a bit depending on which dimension we're
> > > >> > discussing.
> > > >> > > > Once I
> > > >> > > > > > > understand your specific concern I can give a more
> useful
> > > >> > response.
> > > >> > > > > > >
> > > >> > > > > > > Thanks again for the support — looking forward to the
> > > continued
> > > >> > > > > > discussion.
> > > >> > > > > > >
> > > >> > > > > > > Best,
> > > >> > > > > > > Guowei
> > > >> > > > > > >
> > > >> > > > > > >
> > > >> > > > > > > On Tue, Apr 28, 2026 at 4:34 PM zl z <
> > [email protected]
> > > >
> > > >> > wrote:
> > > >> > > > > > >
> > > >> > > > > > > > Hey Guowei,
> > > >> > > > > > > >
> > > >> > > > > > > > Thanks for the proposal, and I think this is very
> > > valuable. I
> > > >> > have
> > > >> > > > some
> > > >> > > > > > > > question about it:
> > > >> > > > > > > >
> > > >> > > > > > > > 1. What are our expected throughput and latency
> targets?
> > > Do
> > > >> we
> > > >> > > > have any
> > > >> > > > > > > > forward-looking tests for this?
> > > >> > > > > > > >
> > > >> > > > > > > > 2. AI involves a very large number of operators.
> Besides
> > > >> > allowing
> > > >> > > > users
> > > >> > > > > > > to
> > > >> > > > > > > > use them through UDFs, will we also provide commonly
> > used
> > > >> > built-in
> > > >> > > > > > > > operators?
> > > >> > > > > > > >
> > > >> > > > > > > > 3. Each of the 11 sub-FLIPs is a major project
> > involving a
> > > >> > > > significant
> > > >> > > > > > > > amount of changes. What is our plan for this?
> > > >> > > > > > > >
> > > >> > > > > > > > 4. GPU scheduling is extremely complex. What is our
> > > current
> > > >> > > > roadmap for
> > > >> > > > > > > > this?
> > > >> > > > > > > >
> > > >> > > > > > > > This is a very high-quality and exciting proposal.
> > Making
> > > >> > Flink an
> > > >> > > > > > > > AI-native data processing engine will make it far more
> > > >> > valuable in
> > > >> > > > the
> > > >> > > > > > AI
> > > >> > > > > > > > era. Look forward to seeing it land and come to
> fruition
> > > >> soon.
> > > >> > > > > > > >
> > > >> > > > > > > > Robert Metzger <[email protected]> 于2026年4月28日周二
> > > 14:38写道：
> > > >> > > > > > > >
> > > >> > > > > > > > > Hey Guowei,
> > > >> > > > > > > > >
> > > >> > > > > > > > > Thanks for the proposal. I just took a brief look,
> > here
> > > are
> > > >> > some
> > > >> > > > high
> > > >> > > > > > > > level
> > > >> > > > > > > > > questions:
> > > >> > > > > > > > >
> > > >> > > > > > > > > Regarding the RPC Operator: What is the difference
> to
> > > the
> > > >> > async
> > > >> > > > io
> > > >> > > > > > > > operator
> > > >> > > > > > > > > we have already?
> > > >> > > > > > > > >
> > > >> > > > > > > > > "Connector API for Multimodal Data Source/Sink": Why
> > do
> > > we
> > > >> > need
> > > >> > > > to
> > > >> > > > > > > touch
> > > >> > > > > > > > > the connector API for supporting multimodal data?
> > Isn't
> > > >> this
> > > >> > > > more of
> > > >> > > > > > a
> > > >> > > > > > > > > formats concern?
> > > >> > > > > > > > >
> > > >> > > > > > > > > "Non-Disruptive Scaling for CPU Operators": How do
> you
> > > want
> > > >> > to
> > > >> > > > > > > guarantee
> > > >> > > > > > > > > exactly-once on that kind of scaling? E.g. you need
> to
> > > >> > somehow
> > > >> > > > make a
> > > >> > > > > > > > > handover between the old and new new pipeline
> > > >> > > > > > > > >
> > > >> > > > > > > > > Overall, I find the proposal has some things which
> > seem
> > > >> > related
> > > >> > > > to
> > > >> > > > > > > making
> > > >> > > > > > > > > Flink more AI native, but other changes seem
> > orthogonal
> > > to
> > > >> > that.
> > > >> > > > For
> > > >> > > > > > > > > example the checkpoint or scaling changes are
> actually
> > > >> > unrelated
> > > >> > > > to
> > > >> > > > > > AI,
> > > >> > > > > > > > and
> > > >> > > > > > > > > just engine improvements.
> > > >> > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > > > On Tue, Apr 28, 2026 at 5:48 AM Guowei Ma <
> > > >> > [email protected]>
> > > >> > > > > > > wrote:
> > > >> > > > > > > > >
> > > >> > > > > > > > > > Hi everyone,
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > I'd like to start a discussion on an umbrella
> > FLIP[1]
> > > >> that
> > > >> > lays
> > > >> > > > > > out a
> > > >> > > > > > > > > > direction for evolving Flink into a data engine
> that
> > > >> > natively
> > > >> > > > > > > supports
> > > >> > > > > > > > AI
> > > >> > > > > > > > > > workloads.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > The short version: user workloads are shifting
> from
> > BI
> > > >> > > > analytics to
> > > >> > > > > > > > > > multimodal data processing centered on model
> > > inference,
> > > >> and
> > > >> > > > this
> > > >> > > > > > > > triggers
> > > >> > > > > > > > > > cascading changes across the stack — multimodal
> data
> > > >> > flowing
> > > >> > > > > > through
> > > >> > > > > > > > > > pipelines, heterogeneous CPU/GPU resources,
> > vectorized
> > > >> > > > execution,
> > > >> > > > > > and
> > > >> > > > > > > > > > inference tasks that run for seconds to minutes on
> > > Spot
> > > >> > > > instances.
> > > >> > > > > > > The
> > > >> > > > > > > > > > proposal sketches an evolution along five
> directions
> > > >> > > > (development
> > > >> > > > > > > > > paradigm,
> > > >> > > > > > > > > > data model, heterogeneous resources, execution
> > engine,
> > > >> > fault
> > > >> > > > > > > > tolerance),
> > > >> > > > > > > > > > decomposed into 11 sub-FLIPs organized into three
> > > layers:
> > > >> > core
> > > >> > > > > > > runtime
> > > >> > > > > > > > > > primitives, AI workload expression and execution,
> > and
> > > >> > > > > > > production-grade
> > > >> > > > > > > > > > operational guarantees. Most sub-FLIPs have no
> hard
> > > >> > > > dependencies on
> > > >> > > > > > > > each
> > > >> > > > > > > > > > other and can be advanced in parallel.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > A note on scope, since it's an umbrella:
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > - In scope here: whether the evolution directions
> > are
> > > >> > > > reasonable,
> > > >> > > > > > > > whether
> > > >> > > > > > > > > > each sub-FLIP's motivation and proposed approach
> are
> > > >> > > > well-founded,
> > > >> > > > > > > and
> > > >> > > > > > > > > > whether the boundaries and dependencies between
> > > sub-FLIPs
> > > >> > are
> > > >> > > > > > clear.
> > > >> > > > > > > > > > - Out of scope here: detailed designs, API
> > specifics,
> > > and
> > > >> > > > > > > > implementation
> > > >> > > > > > > > > > plans of individual sub-FLIPs — those will go
> > through
> > > >> > their own
> > > >> > > > > > > FLIPs.
> > > >> > > > > > > > > > - Consensus criteria: agreement on the overall
> > > direction
> > > >> is
> > > >> > > > > > > sufficient
> > > >> > > > > > > > > for
> > > >> > > > > > > > > > the umbrella to pass; passing it does not lock in
> > any
> > > >> > > > sub-FLIP's
> > > >> > > > > > > > design —
> > > >> > > > > > > > > > sub-FLIPs may still be adjusted, deferred, or
> > > withdrawn
> > > >> as
> > > >> > they
> > > >> > > > > > > > progress.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > All proposed changes are incremental — no existing
> > > API or
> > > >> > > > behavior
> > > >> > > > > > is
> > > >> > > > > > > > > > removed or altered. Compatibility details are
> > covered
> > > at
> > > >> > the
> > > >> > > > end of
> > > >> > > > > > > the
> > > >> > > > > > > > > > document.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Looking forward to your feedback on the overall
> > > direction
> > > >> > and
> > > >> > > > the
> > > >> > > > > > > > > layering.
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > [1]
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=421957275
> > > >> > > > > > > > > >
> > > >> > > > > > > > > > Thanks,
> > > >> > > > > > > > > > Guowei
> > > >> > > > > > > > > >
> > > >> > > > > > > > >
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > >
> > > >> >
> > > >>
> > >
> >
>

Re: Re: [DISCUSS] FLIP-577: AI-Native Flink — An Umbrella Proposal for Multimodal Data Processing

Reply via email to