gaoyunhaii commented on pull request #18805:
URL: https://github.com/apache/flink/pull/18805#issuecomment-1059720777


   Hi Fabian~ thanks for the clarification!
   
   For the `filterRecoveredCommittables`, it should be initiated by some data 
lake systems like iceberg, if there are global committables from more than one 
checkpoint not committed before failover, then it should be able to merge them 
into one global committable on restoring, thus the restoring process could be 
accelerated [1].
   
   For the `combine`, sorry I overlooked the issue and the current 
implementation, currently it is indeed called before committed. The issue is 
said that the committables should be combined on `snapshotState`, and the 
resulted global committables should be persisted, thus if there are failovers 
and re-commit, the `combine` does need to be executed again.
   
   If we want to call `filterRecoveredCommittables` for each snapshot, we might 
need to add a parameter `isRestored` to the `CommittableManager#commit`. If we 
want to also keep the same behavior for the `combine` method, we might need to 
do some more modification to the `CommittableManager` or not reuse the current 
implementation.
   
   However, currently the sink implementation in the `iceberg`, `hudi` and 
`flink-table-store` are indeed using the customized operators / legacy sink API 
instead of the sink API v1. In addition, the two issues should mainly affect he 
performance instead of the correctness. At last, users are still be able to add 
their own global committer implementation with topology API. Thus I think it is 
still useful to solve the two issues, but if we do not have capacity we might 
postpone solving the two issues. 
   
   If so, perhaps we could create a new jira issue for state compatibility with 
the sink v1 and attach this PR to that issue, and dowgrade the FLINK-26173 and 
leaves it open? Since we have not fully fix the issues listed in the 
FLINK-26173~ 
   
   [1] https://lists.apache.org/thread/3tqgnhclbr8ptb69b2ro74535dtbd608


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to