Hi Dong and Yunfeng, Thanks for bringing up this discussion.
As described in the FLIP, the differences between `end-to-end latency` and `table.exec.mini-batch.allow-latency` are: "It allows users to specify the end-to-end latency, whereas table.exec.mini-batch.allow-latency applies to each operator. If there are N operators on the path from source to sink, the end-to-end latency could be up to table.exec.mini-batch.allow-latency * N". If I understand correctly, `table.exec.mini-batch.allow-latency` is also applied to the end-to-end latency for a job, maybe @Jack Wu can give more information. So, from my perspective, and please correct me if I'm misunderstand, the targets of this FLIP may include the following: 1. Support a mechanism like `mini-batch` in SQL for `DataStream`, which will collect data in the operator and flush data when it receives a `flush` event, in the FLIP it is `FlushEvent`. 2. Support dynamic `latency` according to the progress of job, such as snapshot stage and after that. To do that, I have some questions: 1. I didn't understand the purpose of public interface `RecordAttributes`. I think `FlushEvent` in the FLIP is enough, and different `DynamicFlushStrategy` can be added to generate flush events to address different needs, such as a static interval similar to mini-batch in SQL or collect the information `isProcessingBacklog` and metrics to generate `FlushEvent` which is described in your FLIP? If hudi sink needs the `isBacklog` flag, the hudi `SplitEnumerator` can create an operator event and send it to hudi source reader. 2. How is this new mechanism unified with SQL's mini-batch mechanism? As far as I am concerned, SQL implements mini-batch mechanism based on watermark, I think it is very unreasonable to have two different implementation in SQL and DataStream. 3. I notice that the `CheckpointCoordinator` will generate `FlushEvent`, which information about `FlushEvent` will be stored in `Checkpoint`? What is the alignment strategy for FlushEvent in the operator? The operator will flush the data when it receives all `FlushEvent` from upstream with the same ID or do flush for each `FlushEvent`? Can you give more detailed proposal about that? We also have a demand for this piece, thanks Best, Shammon FY On Thu, Jun 29, 2023 at 4:35 PM Martijn Visser <martijnvis...@apache.org> wrote: > Hi Dong and Yunfeng, > > Thanks for the FLIP. What's not clear for me is what's the expected > behaviour when the allowed latency can't be met, for whatever reason. > Given that we're talking about an "allowed latency", it implies that > something has gone wrong and should fail? Isn't this more a minimum > latency that you're proposing? > > There's also the part about the Hudi Sink processing records > immediately upon arrival. Given that the SinkV2 API provides the > ability for custom post and pre-commit topologies [1], specifically > targeted to avoid generating multiple small files, why isn't that > sufficient for the Hudi Sink? It would be great to see that added > under Rejected Alternatives if this is indeed not sufficient. > > Best regards, > > Martijn > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-191%3A+Extend+unified+Sink+interface+to+support+small+file+compaction > > On Sun, Jun 25, 2023 at 4:25 AM Yunfeng Zhou > <flink.zhouyunf...@gmail.com> wrote: > > > > Hi all, > > > > Dong(cc'ed) and I are opening this thread to discuss our proposal to > > support configuring end-to-end allowed latency for Flink jobs, which > > has been documented in FLIP-325 > > < > https://cwiki.apache.org/confluence/display/FLINK/FLIP-325%3A+Support+configuring+end-to-end+allowed+latency > >. > > > > By configuring the latency requirement for a Flink job, users would be > > able to optimize the throughput and overhead of the job while still > > acceptably increasing latency. This approach is particularly useful > > when dealing with records that do not require immediate processing and > > emission upon arrival. > > > > Please refer to the FLIP document for more details about the proposed > > design and implementation. We welcome any feedback and opinions on > > this proposal. > > > > Best regards. > > > > Dong and Yunfeng >