[ https://issues.apache.org/jira/browse/FLINK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284024#comment-17284024 ]
Steven Zhen Wu commented on FLINK-21364: ---------------------------------------- For pull based source (like file/Iceberg), it is probably more natural/efficient to piggyback the `finishedSplitsIds` in the `RequestSplitEvent`. A reader request a new split when the current split is done. It doesn't mean that a reader has to request for a new split when finishing some splits, like bounded Kafka source case. You have a good point that some sources (like Kafka/Kineses) may still need to communicate the watermark info to coordinator/enumerator. In this case, it definitely will be a separate type of event (like `WatermarkUpdateEvent`). In our Iceberg source use cases, readers didn't actually report watermark. They just need to report which split are finished. But I can see that this may not be a very generic scenario to change the `RequestSplitEvent` in flink-runtime. cc [~sundaram] > piggyback finishedSplitIds in RequestSplitEvent > ----------------------------------------------- > > Key: FLINK-21364 > URL: https://issues.apache.org/jira/browse/FLINK-21364 > Project: Flink > Issue Type: Improvement > Components: Connectors / Common > Affects Versions: 1.12.1 > Reporter: Steven Zhen Wu > Priority: Major > Labels: pull-request-available > > For some split assignment strategy, the enumerator/assigner needs to track > the completed splits to advance watermark for event time alignment or rough > ordering. Right now, `RequestSplitEvent` for FLIP-27 source doesn't support > pass-along of the `finishedSplitIds` info and hence we have to create our own > custom source event type for Iceberg source. > Here is the proposal of add such optional info to `RequestSplitEvent`. > {code} > public RequestSplitEvent( > @Nullable String hostName, > @Nullable Collection<String> finishedSplitIds) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)