Hi Kartikey, Thanks for the FLIP. I think this Events Reporter System idea is very good for Flink. We truly need more deep insight, not just metrics, especially for achieving autonomous operations.
The V1 plan looks very practical. Focus on core parts and use asynchronous dispatch for stability is good strategy. For future reporters, I have a question in my mind. I wish to use Kafka Events Reporter at a later time. So, how this V1 design can well and effectively enable developing such a Kafka reporter? Thanks On Sat, May 24, 2025 at 11:07 PM Kartikey Pant <kartikeypant....@gmail.com> wrote: > Hi Flink Devs, > > I’m Kartikey Pant. Drawing on my experience with large-scale Flink > pipelines and AI/ML, I believe Flink needs richer, structured event data > for advanced tuning, AIOps, and deeper observability - moving beyond > current metrics and log scraping. > > To help with this, I've drafted a proposal for a new Flink EventsReporter > System. The core idea is to create something familiar, based on how > MetricReporters work, but focused on emitting key operational events in a > structured way. > > For V1, I'm suggesting we start focused and prioritize stability: > > - > > Build the basic asynchronous reporting framework. > - > > Emit critical events like Job Status changes & Checkpoint results (as > JSON). > - > > Include a simple FileEventsReporter so it's useful right away. > > You can read the full proposal here: > > https://docs.google.com/document/d/1R4fmOTQDLZcUQwgmCCxoRb74MGPZOypiScUbKL43AL4 > > I'm eager for your feedback. Does this V1 approach make sense, or am I > overlooking anything? I'm looking to get more involved in Flink > development, and your insights and guidance here would be incredibly > helpful. > > Thanks a lot, > > Kartikey Pant >