Hi Flink Devs, I’m Kartikey Pant. Drawing on my experience with large-scale Flink pipelines and AI/ML, I believe Flink needs richer, structured event data for advanced tuning, AIOps, and deeper observability - moving beyond current metrics and log scraping.
To help with this, I've drafted a proposal for a new Flink EventsReporter System. The core idea is to create something familiar, based on how MetricReporters work, but focused on emitting key operational events in a structured way. For V1, I'm suggesting we start focused and prioritize stability: - Build the basic asynchronous reporting framework. - Emit critical events like Job Status changes & Checkpoint results (as JSON). - Include a simple FileEventsReporter so it's useful right away. You can read the full proposal here: https://docs.google.com/document/d/1R4fmOTQDLZcUQwgmCCxoRb74MGPZOypiScUbKL43AL4 I'm eager for your feedback. Does this V1 approach make sense, or am I overlooking anything? I'm looking to get more involved in Flink development, and your insights and guidance here would be incredibly helpful. Thanks a lot, Kartikey Pant