Thanks Steven for driving this!

I'm very much for deprecating FlinkSource for IcebergSource.
My only concern is doing this only for Flink 1.20. If this is only a single
default value change, I'm fine with it. OTOH having bigger differences
between the source of the different Flink versions would cause more
maintenance headache in the future for a minimal gain.

I understand that Flink "natively" doesn't guarantee state compatibility
between major/minor versions. If needed, I suggest that we mirror this with
the Iceberg connector, and use documentation to highlight the change for
the users between Iceberg 1.6 and Iceberg 1.7.

Thanks,
Peter

Fokko Driesprong <fo...@apache.org> ezt írta (időpont: 2024. aug. 12., H,
10:12):

> Hey Steven,
>
> That sounds very exciting! I'm not a heavy Flink user, but I don't see any
> issues enabling it on Flink 1.20. We should make it explicit in the
> changelog, and if possible give some hints on how to drain the Flink jobs.
>
> Kind regards,
> Fokko
>
> Op ma 12 aug 2024 om 04:57 schreef Steven Wu <stevenz...@gmail.com>:
>
>>
>> *What*
>>
>> In the next Iceberg 1.7 release with Flink 1.20 support [1], I
>> am proposing to make the following changes for *Flink* *1.20 only* .
>>
>> 1. Mark the old `FlinkSource` as deprecated and redirect users to the
>> FLIP-27 `IcebergSource` in the Javadoc.
>>
>> 2. Make the FLIP-27 source the default for Flink SQL. Users can still opt
>> back to the old source via config if needed. Due to the change of source
>> implementation and checkpoint state, users won't be able to restore from
>> checkpoint/savepoint for the upgrade to Flink 1.20 and Iceberg 1.7. As
>> Flink doesn't guarantee state compatibility for new major-minor Flink
>> version upgrades e.g. from 1.19 to 1.20 [12], this should be acceptable
>> to Flink SQL users. We should clearly call out the change and state
>> incompatibility in the release notes.
>>
>> *Why*
>>
>> FLIP-27 is the new source interface introduced by Flink in early 2021.
>> The new FLIP-27 `IcebergSource` implementation [2] was added into Iceberg
>> around mid of 2022. It was initially added as @Experimental and requires
>> code change to switch to the new API. For Flink SQL jobs, default is still
>> the old `FlinkSource` implementation and requires config change to opt in
>> to the FLIP-27 `IcebergSource`.
>>
>> It has been two years since the initial introduction of FLIP-27 source
>> implementation in Iceberg. Now is probably a good time to switch the
>> default to FLIP-27 source.
>>
>> 1. The community has continue to improve the FLIP-27 sources, like JSON
>> serializer for FileScanTask [3], split discovery throttling [4], watermark
>> alignment [5], split enumerator monitoring metrics [6], metadata table
>> reading [8], speculative execution [9]. Those improvements are not
>> available in the old source implementation.
>> 2. We have recently closed the remaining gaps like limit pushdown [10],
>> inferring source parallelism [11] for batch execution to achieve feature
>> parity between the old and new FLIP-27 source.
>> 3.FLIP-27 source has been used by many users in the production
>> environment for almost two years now. It has been battle tested.
>> 4. The old SourceFunction interface has been marked as deprecated since
>> Flink 1.18 on Aug 2023 [7].
>>
>>
>> *References*
>> [1] https://github.com/apache/iceberg/pull/10881
>> [2] https://github.com/apache/iceberg/projects/23
>> [3] https://github.com/apache/iceberg/issues/1698
>> [4] https://github.com/apache/iceberg/pull/6299
>> [5] https://github.com/apache/iceberg/pull/8553
>> [6] https://github.com/apache/iceberg/pull/9524
>> [7] https://issues.apache.org/jira/browse/FLINK-28046
>> [8] https://github.com/apache/iceberg/pull/6222
>> [9] https://github.com/apache/iceberg/pull/10548
>> [10] https://github.com/apache/iceberg/pull/10748
>> [11] https://github.com/apache/iceberg/pull/10832
>> [12]
>> https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/ops/upgrading/#table-api--sql
>>
>>

Reply via email to