Thanks Steven for driving this! I'm very much for deprecating FlinkSource for IcebergSource. My only concern is doing this only for Flink 1.20. If this is only a single default value change, I'm fine with it. OTOH having bigger differences between the source of the different Flink versions would cause more maintenance headache in the future for a minimal gain.
I understand that Flink "natively" doesn't guarantee state compatibility between major/minor versions. If needed, I suggest that we mirror this with the Iceberg connector, and use documentation to highlight the change for the users between Iceberg 1.6 and Iceberg 1.7. Thanks, Peter Fokko Driesprong <fo...@apache.org> ezt írta (időpont: 2024. aug. 12., H, 10:12): > Hey Steven, > > That sounds very exciting! I'm not a heavy Flink user, but I don't see any > issues enabling it on Flink 1.20. We should make it explicit in the > changelog, and if possible give some hints on how to drain the Flink jobs. > > Kind regards, > Fokko > > Op ma 12 aug 2024 om 04:57 schreef Steven Wu <stevenz...@gmail.com>: > >> >> *What* >> >> In the next Iceberg 1.7 release with Flink 1.20 support [1], I >> am proposing to make the following changes for *Flink* *1.20 only* . >> >> 1. Mark the old `FlinkSource` as deprecated and redirect users to the >> FLIP-27 `IcebergSource` in the Javadoc. >> >> 2. Make the FLIP-27 source the default for Flink SQL. Users can still opt >> back to the old source via config if needed. Due to the change of source >> implementation and checkpoint state, users won't be able to restore from >> checkpoint/savepoint for the upgrade to Flink 1.20 and Iceberg 1.7. As >> Flink doesn't guarantee state compatibility for new major-minor Flink >> version upgrades e.g. from 1.19 to 1.20 [12], this should be acceptable >> to Flink SQL users. We should clearly call out the change and state >> incompatibility in the release notes. >> >> *Why* >> >> FLIP-27 is the new source interface introduced by Flink in early 2021. >> The new FLIP-27 `IcebergSource` implementation [2] was added into Iceberg >> around mid of 2022. It was initially added as @Experimental and requires >> code change to switch to the new API. For Flink SQL jobs, default is still >> the old `FlinkSource` implementation and requires config change to opt in >> to the FLIP-27 `IcebergSource`. >> >> It has been two years since the initial introduction of FLIP-27 source >> implementation in Iceberg. Now is probably a good time to switch the >> default to FLIP-27 source. >> >> 1. The community has continue to improve the FLIP-27 sources, like JSON >> serializer for FileScanTask [3], split discovery throttling [4], watermark >> alignment [5], split enumerator monitoring metrics [6], metadata table >> reading [8], speculative execution [9]. Those improvements are not >> available in the old source implementation. >> 2. We have recently closed the remaining gaps like limit pushdown [10], >> inferring source parallelism [11] for batch execution to achieve feature >> parity between the old and new FLIP-27 source. >> 3.FLIP-27 source has been used by many users in the production >> environment for almost two years now. It has been battle tested. >> 4. The old SourceFunction interface has been marked as deprecated since >> Flink 1.18 on Aug 2023 [7]. >> >> >> *References* >> [1] https://github.com/apache/iceberg/pull/10881 >> [2] https://github.com/apache/iceberg/projects/23 >> [3] https://github.com/apache/iceberg/issues/1698 >> [4] https://github.com/apache/iceberg/pull/6299 >> [5] https://github.com/apache/iceberg/pull/8553 >> [6] https://github.com/apache/iceberg/pull/9524 >> [7] https://issues.apache.org/jira/browse/FLINK-28046 >> [8] https://github.com/apache/iceberg/pull/6222 >> [9] https://github.com/apache/iceberg/pull/10548 >> [10] https://github.com/apache/iceberg/pull/10748 >> [11] https://github.com/apache/iceberg/pull/10832 >> [12] >> https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/ops/upgrading/#table-api--sql >> >>