Hi Guys I've just recently started using Apache Flink to evaluate its suitability for a project I'm working on.
First impressions are that the project is great, well documented and has lots of examples and guidance showcasing the multitude of things that it can do. Challenging knowing where to start at times as there are many ways to achieve the same result. So my pipeline is similar to an ETL, I have a continuous DataStream source of Java records modelled as POJOs which I then transform each POJO to a single JSON record before writing to a streaming sink. This all works as expected. However my question is in 2 parts and I hope you can help, apologies in advance if this question highlights my lack of experience. - Can I get access to the infight records that are currently being processed within Flink ? By inflight I mean the records that are currently being processed but haven't been written to the sink. - Is the number of inflight records deterministic? How many records does Flink process per subtask/thread. For example, it might be 1 record at a time per subtask? Thanks Declan