+1 from me. Good idea. On Mon, Aug 12, 2024 at 9:01 AM Péter Váry <peter.vary.apa...@gmail.com> wrote:
> Cool +1 from me then. > > Steven Wu <stevenz...@gmail.com> ezt írta (időpont: 2024. aug. 12., H, > 17:56): > >> > My only concern is doing this only for Flink 1.20. If this is only a >> single default value change, I'm fine with it. >> >> it is one config change plus Java doc and @deprecated change. It is very >> minimal. >> >> I don't see the benefit outweighing the state incompatibility of the >> switch if we also make the change for Flink 1.18 and 1.19 in the Iceberg >> 1.7 release. Hence, I would suggest only making the change for Flink 1.20. >> >> >> >> On Mon, Aug 12, 2024 at 4:38 AM Péter Váry <peter.vary.apa...@gmail.com> >> wrote: >> >>> Thanks Steven for driving this! >>> >>> I'm very much for deprecating FlinkSource for IcebergSource. >>> My only concern is doing this only for Flink 1.20. If this is only a >>> single default value change, I'm fine with it. OTOH having bigger >>> differences between the source of the different Flink versions would cause >>> more maintenance headache in the future for a minimal gain. >>> >>> I understand that Flink "natively" doesn't guarantee state compatibility >>> between major/minor versions. If needed, I suggest that we mirror this with >>> the Iceberg connector, and use documentation to highlight the change for >>> the users between Iceberg 1.6 and Iceberg 1.7. >>> >>> Thanks, >>> Peter >>> >>> Fokko Driesprong <fo...@apache.org> ezt írta (időpont: 2024. aug. 12., >>> H, 10:12): >>> >>>> Hey Steven, >>>> >>>> That sounds very exciting! I'm not a heavy Flink user, but I don't see >>>> any issues enabling it on Flink 1.20. We should make it explicit in the >>>> changelog, and if possible give some hints on how to drain the Flink jobs. >>>> >>>> Kind regards, >>>> Fokko >>>> >>>> Op ma 12 aug 2024 om 04:57 schreef Steven Wu <stevenz...@gmail.com>: >>>> >>>>> >>>>> *What* >>>>> >>>>> In the next Iceberg 1.7 release with Flink 1.20 support [1], I >>>>> am proposing to make the following changes for *Flink* *1.20 only* . >>>>> >>>>> 1. Mark the old `FlinkSource` as deprecated and redirect users to the >>>>> FLIP-27 `IcebergSource` in the Javadoc. >>>>> >>>>> 2. Make the FLIP-27 source the default for Flink SQL. Users can still >>>>> opt back to the old source via config if needed. Due to the change of >>>>> source implementation and checkpoint state, users won't be able to restore >>>>> from checkpoint/savepoint for the upgrade to Flink 1.20 and Iceberg 1.7. >>>>> As >>>>> Flink doesn't guarantee state compatibility for new major-minor Flink >>>>> version upgrades e.g. from 1.19 to 1.20 [12], this should be >>>>> acceptable to Flink SQL users. We should clearly call out the change and >>>>> state incompatibility in the release notes. >>>>> >>>>> *Why* >>>>> >>>>> FLIP-27 is the new source interface introduced by Flink in early 2021. >>>>> The new FLIP-27 `IcebergSource` implementation [2] was added into Iceberg >>>>> around mid of 2022. It was initially added as @Experimental and requires >>>>> code change to switch to the new API. For Flink SQL jobs, default is still >>>>> the old `FlinkSource` implementation and requires config change to opt in >>>>> to the FLIP-27 `IcebergSource`. >>>>> >>>>> It has been two years since the initial introduction of FLIP-27 source >>>>> implementation in Iceberg. Now is probably a good time to switch the >>>>> default to FLIP-27 source. >>>>> >>>>> 1. The community has continue to improve the FLIP-27 sources, like >>>>> JSON serializer for FileScanTask [3], split discovery throttling [4], >>>>> watermark alignment [5], split enumerator monitoring metrics [6], metadata >>>>> table reading [8], speculative execution [9]. Those improvements are not >>>>> available in the old source implementation. >>>>> 2. We have recently closed the remaining gaps like limit pushdown >>>>> [10], inferring source parallelism [11] for batch execution to achieve >>>>> feature parity between the old and new FLIP-27 source. >>>>> 3.FLIP-27 source has been used by many users in the production >>>>> environment for almost two years now. It has been battle tested. >>>>> 4. The old SourceFunction interface has been marked as deprecated >>>>> since Flink 1.18 on Aug 2023 [7]. >>>>> >>>>> >>>>> *References* >>>>> [1] https://github.com/apache/iceberg/pull/10881 >>>>> [2] https://github.com/apache/iceberg/projects/23 >>>>> [3] https://github.com/apache/iceberg/issues/1698 >>>>> [4] https://github.com/apache/iceberg/pull/6299 >>>>> [5] https://github.com/apache/iceberg/pull/8553 >>>>> [6] https://github.com/apache/iceberg/pull/9524 >>>>> [7] https://issues.apache.org/jira/browse/FLINK-28046 >>>>> [8] https://github.com/apache/iceberg/pull/6222 >>>>> [9] https://github.com/apache/iceberg/pull/10548 >>>>> [10] https://github.com/apache/iceberg/pull/10748 >>>>> [11] https://github.com/apache/iceberg/pull/10832 >>>>> [12] >>>>> https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/ops/upgrading/#table-api--sql >>>>> >>>>> -- Ryan Blue Databricks