Re: [DISCUSS] FLIP-531: Initiate Flink Agents as a new Sub-Project

2025-05-24 Thread Robert Metzger
Thanks for the nice proposal.

One question: The proposal talks a lot about establishing a "sub project".
If I understand correctly, the ASF has a concept of subprojects, with
sub-project committers, mailing lists, jira projects, .. etc. [1][2].

Is the intention of this proposal to establish such a sub project?
Or is the intention to basically create a "flink-agents" git repository,
where all existing Flink committers have access to, and the Flink PMC votes
on releases? (I assume this is the intention). If so, I would update the
proposal to talk about a new repository? or at least clarify the immediate
implications for the project.

My second question is about this key feature:
> *Inter-Agent Communication:* Built-in support for asynchronous
agent-to-agent communication using Kafka.

Does this mean the code from the flink-agents repo will have a dependency
on AK? One of the big benefits of Flink is that it is independent of the
underlying message streaming system. Wouldn't it be more elegant and
actually easier to rely on the Flink connector framework here, and leave
the concrete implementation to the user?
Also, I wonder why we need to rely on an external message streaming system
at all? Is it because we want to be able to send messages into arbitrary
directions? if so, maybe we can re-use code from Flink Statefun? I
personally would think that relying on Flink's internal data transfer model
by default brings a lot of cost, performance, operations and implementation
benefits ... and users can still manually setup a connector using a Kafka,
Pulsar or PubSub connection. WDYT?

Best,
Robert


[1]
https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Sub+Projects
[2]
https://cwiki.apache.org/confluence/display/HADOOP/Apache+Hadoop+Ozone+-+sub-project+to+Apache+TLP+proposal


On Fri, May 23, 2025 at 6:14 AM Xintong Song  wrote:

> @Jing,
>
> I think the FLIP already included the high-level design goals, by listing
> the key features that we plan to support in the Proposed Solution section,
> and demonstrating how using the framework may look like with the code
> examples. Of course the high-level goals need to be further detailed, which
> will be the next step. The purpose of this FLIP is to get community
> consensus on initiating this new project. On the other hand, technical
> design takes time to discuss, and likely requires continuous iteration as
> the project is being developed. So I think it makes sense to separate the
> design discussions from the initiation proposal.
>
> Of course any contributor's thoughts and inputs are valuable to the
> project. And efficiency also matters, as the agentic ai industry grows
> fast, we really need to keep up with the pace. I believe it would be more
> efficient to come up with some initial draft design / implementation that
> everyone can comment on, compared to just randomly collecting ideas when we
> have nothing. Fortunately, the project is at the early stage with no
> historical burdens, which means we don't need to carefully make sure
> everything is perfect in advance, and can always correct / change / rework
> things if needed. We can at least do that before we commit to the product
> compatibility with the first formal release. This is why we suggested
> applying a light, execution-first process, as mentioned in the Operating
> Model section. I would not be concerned too much about not collecting
> enough inputs at the beginning, because we can always adjust things
> afterwards based on new suggestions and opinions.
>
> Best,
>
> Xintong
>
>
>
> On Fri, May 23, 2025 at 12:13 AM Jing Ge 
> wrote:
>
> > It is great to see that everyone in this thread agreed with the
> high-level
> > proposal. Just so excited and could not stop asking questions :-) Thanks
> > Xintong for the update!
> >
> > I'd like to share a little bit more thoughts with my questions and your
> > additional input. And then lead to a small suggestion.
> >
> > 1. It is great to support freestyle tools beyond MCP protocol, from users
> > perspective. However, if we consider agent framework design, there might
> be
> > some choices to make. For example, either we stick to MCP internally and
> > turn such external freestyle tools into MCP internally or we will design
> a
> > new abstraction to handle diverse function calls offered by different
> > LLMs, kind of repeating what MCP did.  Another thought, which I feel, is
> > that the sample API in the FLIP shows somehow, as a user, after a MCP
> > server registration, I could use the close follow-up prompt() method to
> > modify/extend the standard out-of-box context provided by the MCP server.
> > But it is too detailed and should not be discussed in this high-level
> > thread. Happy to join any (offline) discussion and contribute.
> >
> > 3. Similar to microservices, there are a few use cases that are sensitive
> > to the response latency, e.g. stock trading, etc. But it is totally fine
> to
> > focus on asynchronous communication.
> >
>

Re: [DISCUSS] FLIP-531: Initiate Flink Agents as a new Sub-Project

2025-05-24 Thread Yuan Mei
Thanks Xintong, Sean and Chris.

This is a great step forward for the future of Flink. I'm really looking
forward to it!

Best,
Yuan

On Sat, May 24, 2025 at 10:00 PM Robert Metzger  wrote:

> Thanks for the nice proposal.
>
> One question: The proposal talks a lot about establishing a "sub project".
> If I understand correctly, the ASF has a concept of subprojects, with
> sub-project committers, mailing lists, jira projects, .. etc. [1][2].
>
> Is the intention of this proposal to establish such a sub project?
> Or is the intention to basically create a "flink-agents" git repository,
> where all existing Flink committers have access to, and the Flink PMC votes
> on releases? (I assume this is the intention). If so, I would update the
> proposal to talk about a new repository? or at least clarify the immediate
> implications for the project.
>
> My second question is about this key feature:
> > *Inter-Agent Communication:* Built-in support for asynchronous
> agent-to-agent communication using Kafka.
>
> Does this mean the code from the flink-agents repo will have a dependency
> on AK? One of the big benefits of Flink is that it is independent of the
> underlying message streaming system. Wouldn't it be more elegant and
> actually easier to rely on the Flink connector framework here, and leave
> the concrete implementation to the user?
> Also, I wonder why we need to rely on an external message streaming system
> at all? Is it because we want to be able to send messages into arbitrary
> directions? if so, maybe we can re-use code from Flink Statefun? I
> personally would think that relying on Flink's internal data transfer model
> by default brings a lot of cost, performance, operations and implementation
> benefits ... and users can still manually setup a connector using a Kafka,
> Pulsar or PubSub connection. WDYT?
>
> Best,
> Robert
>
>
> [1]
>
> https://cwiki.apache.org/confluence/display/CASSANDRA/Cassandra+Sub+Projects
> [2]
>
> https://cwiki.apache.org/confluence/display/HADOOP/Apache+Hadoop+Ozone+-+sub-project+to+Apache+TLP+proposal
>
>
> On Fri, May 23, 2025 at 6:14 AM Xintong Song 
> wrote:
>
> > @Jing,
> >
> > I think the FLIP already included the high-level design goals, by listing
> > the key features that we plan to support in the Proposed Solution
> section,
> > and demonstrating how using the framework may look like with the code
> > examples. Of course the high-level goals need to be further detailed,
> which
> > will be the next step. The purpose of this FLIP is to get community
> > consensus on initiating this new project. On the other hand, technical
> > design takes time to discuss, and likely requires continuous iteration as
> > the project is being developed. So I think it makes sense to separate the
> > design discussions from the initiation proposal.
> >
> > Of course any contributor's thoughts and inputs are valuable to the
> > project. And efficiency also matters, as the agentic ai industry grows
> > fast, we really need to keep up with the pace. I believe it would be more
> > efficient to come up with some initial draft design / implementation that
> > everyone can comment on, compared to just randomly collecting ideas when
> we
> > have nothing. Fortunately, the project is at the early stage with no
> > historical burdens, which means we don't need to carefully make sure
> > everything is perfect in advance, and can always correct / change /
> rework
> > things if needed. We can at least do that before we commit to the product
> > compatibility with the first formal release. This is why we suggested
> > applying a light, execution-first process, as mentioned in the Operating
> > Model section. I would not be concerned too much about not collecting
> > enough inputs at the beginning, because we can always adjust things
> > afterwards based on new suggestions and opinions.
> >
> > Best,
> >
> > Xintong
> >
> >
> >
> > On Fri, May 23, 2025 at 12:13 AM Jing Ge 
> > wrote:
> >
> > > It is great to see that everyone in this thread agreed with the
> > high-level
> > > proposal. Just so excited and could not stop asking questions :-)
> Thanks
> > > Xintong for the update!
> > >
> > > I'd like to share a little bit more thoughts with my questions and your
> > > additional input. And then lead to a small suggestion.
> > >
> > > 1. It is great to support freestyle tools beyond MCP protocol, from
> users
> > > perspective. However, if we consider agent framework design, there
> might
> > be
> > > some choices to make. For example, either we stick to MCP internally
> and
> > > turn such external freestyle tools into MCP internally or we will
> design
> > a
> > > new abstraction to handle diverse function calls offered by different
> > > LLMs, kind of repeating what MCP did.  Another thought, which I feel,
> is
> > > that the sample API in the FLIP shows somehow, as a user, after a MCP
> > > server registration, I could use the close follow-up prompt() method to
> > > modify/extend the standa

[DISCUSS] FLIP-XXX: Flink Events Reporter System

2025-05-24 Thread Kartikey Pant
Hi Flink Devs,

I’m Kartikey Pant. Drawing on my experience with large-scale Flink
pipelines and AI/ML, I believe Flink needs richer, structured event data
for advanced tuning, AIOps, and deeper observability - moving beyond
current metrics and log scraping.

To help with this, I've drafted a proposal for a new Flink EventsReporter
System. The core idea is to create something familiar, based on how
MetricReporters work, but focused on emitting key operational events in a
structured way.

For V1, I'm suggesting we start focused and prioritize stability:

   -

   Build the basic asynchronous reporting framework.
   -

   Emit critical events like Job Status changes & Checkpoint results (as
   JSON).
   -

   Include a simple FileEventsReporter so it's useful right away.

You can read the full proposal here:
https://docs.google.com/document/d/1R4fmOTQDLZcUQwgmCCxoRb74MGPZOypiScUbKL43AL4

I'm eager for your feedback. Does this V1 approach make sense, or am I
overlooking anything? I'm looking to get more involved in Flink
development, and your insights and guidance here would be incredibly
helpful.

Thanks a lot,

Kartikey Pant


[jira] [Created] (FLINK-37840) Row writer should honor null uncompact BigDecimal and Timestamp

2025-05-24 Thread Deng Ziming (Jira)
Deng Ziming created FLINK-37840:
---

 Summary: Row writer should honor null uncompact BigDecimal and 
Timestamp
 Key: FLINK-37840
 URL: https://issues.apache.org/jira/browse/FLINK-37840
 Project: Flink
  Issue Type: Improvement
Reporter: Deng Ziming






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-37841) Having adaptive retry strategy causing issues in DDB Connector

2025-05-24 Thread Abhi Gupta (Jira)
Abhi Gupta created FLINK-37841:
--

 Summary: Having adaptive retry strategy causing issues in DDB 
Connector
 Key: FLINK-37841
 URL: https://issues.apache.org/jira/browse/FLINK-37841
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / DynamoDB
Reporter: Abhi Gupta


AWS docs say the following:



> The adaptive retry strategy includes all the features of the standard 
> strategy and adds a client-side rate limiter that measures the rate of 
> throttled requests compared to non-throttled requests. The strategy uses this 
> measurement to slow down the requests in an attempt to stay within a safe 
> bandwidth, ideally causing zero throttling errors.

 

This means requests maybe getting slowed down and the connector not achieving 
desired performance in things like shard discovery. Having standard retry 
strategy should be able to better fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)