+1 from me. Good idea.

On Mon, Aug 12, 2024 at 9:01 AM Péter Váry <peter.vary.apa...@gmail.com>
wrote:

> Cool +1 from me then.
>
> Steven Wu <stevenz...@gmail.com> ezt írta (időpont: 2024. aug. 12., H,
> 17:56):
>
>> > My only concern is doing this only for Flink 1.20. If this is only a
>> single default value change, I'm fine with it.
>>
>> it is one config change plus Java doc and @deprecated change. It is very
>> minimal.
>>
>> I don't see the benefit outweighing the state incompatibility of the
>> switch if we also make the change for Flink 1.18 and 1.19 in the Iceberg
>> 1.7 release. Hence, I would suggest only making the change for Flink 1.20.
>>
>>
>>
>> On Mon, Aug 12, 2024 at 4:38 AM Péter Váry <peter.vary.apa...@gmail.com>
>> wrote:
>>
>>> Thanks Steven for driving this!
>>>
>>> I'm very much for deprecating FlinkSource for IcebergSource.
>>> My only concern is doing this only for Flink 1.20. If this is only a
>>> single default value change, I'm fine with it. OTOH having bigger
>>> differences between the source of the different Flink versions would cause
>>> more maintenance headache in the future for a minimal gain.
>>>
>>> I understand that Flink "natively" doesn't guarantee state compatibility
>>> between major/minor versions. If needed, I suggest that we mirror this with
>>> the Iceberg connector, and use documentation to highlight the change for
>>> the users between Iceberg 1.6 and Iceberg 1.7.
>>>
>>> Thanks,
>>> Peter
>>>
>>> Fokko Driesprong <fo...@apache.org> ezt írta (időpont: 2024. aug. 12.,
>>> H, 10:12):
>>>
>>>> Hey Steven,
>>>>
>>>> That sounds very exciting! I'm not a heavy Flink user, but I don't see
>>>> any issues enabling it on Flink 1.20. We should make it explicit in the
>>>> changelog, and if possible give some hints on how to drain the Flink jobs.
>>>>
>>>> Kind regards,
>>>> Fokko
>>>>
>>>> Op ma 12 aug 2024 om 04:57 schreef Steven Wu <stevenz...@gmail.com>:
>>>>
>>>>>
>>>>> *What*
>>>>>
>>>>> In the next Iceberg 1.7 release with Flink 1.20 support [1], I
>>>>> am proposing to make the following changes for *Flink* *1.20 only* .
>>>>>
>>>>> 1. Mark the old `FlinkSource` as deprecated and redirect users to the
>>>>> FLIP-27 `IcebergSource` in the Javadoc.
>>>>>
>>>>> 2. Make the FLIP-27 source the default for Flink SQL. Users can still
>>>>> opt back to the old source via config if needed. Due to the change of
>>>>> source implementation and checkpoint state, users won't be able to restore
>>>>> from checkpoint/savepoint for the upgrade to Flink 1.20 and Iceberg 1.7. 
>>>>> As
>>>>> Flink doesn't guarantee state compatibility for new major-minor Flink
>>>>> version upgrades e.g. from 1.19 to 1.20 [12], this should be
>>>>> acceptable to Flink SQL users. We should clearly call out the change and
>>>>> state incompatibility in the release notes.
>>>>>
>>>>> *Why*
>>>>>
>>>>> FLIP-27 is the new source interface introduced by Flink in early 2021.
>>>>> The new FLIP-27 `IcebergSource` implementation [2] was added into Iceberg
>>>>> around mid of 2022. It was initially added as @Experimental and requires
>>>>> code change to switch to the new API. For Flink SQL jobs, default is still
>>>>> the old `FlinkSource` implementation and requires config change to opt in
>>>>> to the FLIP-27 `IcebergSource`.
>>>>>
>>>>> It has been two years since the initial introduction of FLIP-27 source
>>>>> implementation in Iceberg. Now is probably a good time to switch the
>>>>> default to FLIP-27 source.
>>>>>
>>>>> 1. The community has continue to improve the FLIP-27 sources, like
>>>>> JSON serializer for FileScanTask [3], split discovery throttling [4],
>>>>> watermark alignment [5], split enumerator monitoring metrics [6], metadata
>>>>> table reading [8], speculative execution [9]. Those improvements are not
>>>>> available in the old source implementation.
>>>>> 2. We have recently closed the remaining gaps like limit pushdown
>>>>> [10], inferring source parallelism [11] for batch execution to achieve
>>>>> feature parity between the old and new FLIP-27 source.
>>>>> 3.FLIP-27 source has been used by many users in the production
>>>>> environment for almost two years now. It has been battle tested.
>>>>> 4. The old SourceFunction interface has been marked as deprecated
>>>>> since Flink 1.18 on Aug 2023 [7].
>>>>>
>>>>>
>>>>> *References*
>>>>> [1] https://github.com/apache/iceberg/pull/10881
>>>>> [2] https://github.com/apache/iceberg/projects/23
>>>>> [3] https://github.com/apache/iceberg/issues/1698
>>>>> [4] https://github.com/apache/iceberg/pull/6299
>>>>> [5] https://github.com/apache/iceberg/pull/8553
>>>>> [6] https://github.com/apache/iceberg/pull/9524
>>>>> [7] https://issues.apache.org/jira/browse/FLINK-28046
>>>>> [8] https://github.com/apache/iceberg/pull/6222
>>>>> [9] https://github.com/apache/iceberg/pull/10548
>>>>> [10] https://github.com/apache/iceberg/pull/10748
>>>>> [11] https://github.com/apache/iceberg/pull/10832
>>>>> [12]
>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/ops/upgrading/#table-api--sql
>>>>>
>>>>>

-- 
Ryan Blue
Databricks

Reply via email to