Small update: I'm facing some Kryo serializer issues after moving the
code to the final 2.0.0 release. This is caused by a Kryo update from
2.24.0 to 5.6.2. Details on this issue are in this JIRA:
https://issues.apache.org/jira/browse/FLINK-37546

The port to 2.0.0 worked well otherwise, including porting FlinkSource
which uses Flink interfaces obsolete in Flink 2.0.0. The official
community guideline is to refrain from using these interfaces. They
have been kept around mainly due to keeping support for the
StateProcessor API, which allows reading pre-2.0.0 Flink state. The
official release announcement actively discourages usage:

> NOTICE: You may find some of the removed APIs still exist in the code base, 
> usually in a different package. They are for internal usages only and can be 
> changed / removed anytime without notifications. Please DO NOT USE them.
https://flink.apache.org/2025/03/24/apache-flink-2.0.0-a-new-era-of-real-time-data-processing/

Based on that, I would suggest removing FlinkSource. We have an
agreement on that.

Cheers,
Max

On Fri, Mar 14, 2025 at 3:14 PM Maximilian Michels <m...@apache.org> wrote:
>
> Thanks everyone for the well-considered discussion!
>
> I think it is manageable complexity to backport code removed in other
> Flink versions. That's why I wouldn't make a decision based solely on
> more work for backporting.
>
> Nevertheless, I would advocate for a consistent code base across all
> Flink versions. It is strange to have FlinkSink available for one
> Flink version, but not for another, inside the same Iceberg release.
> It would be the most consistent user experience to at-once remove all
> instances of FlinkSink / FlinkSource across all Flink versions, when
> the time is right. Now, we didn't know whether that would be possible,
> but it turns out that Flink 2.0 is rather backwards-compatible. Here
> is the Flink 2.0 PR: https://github.com/apache/iceberg/pull/12527
>
> Surveying users is a great idea. We probably don't have a good means
> of doing that, as Iceberg doesn't phone home, but some of the larger
> users might be able to.
>
> -Max
>
> On Fri, Mar 14, 2025 at 2:55 AM Steven Wu <stevenz...@gmail.com> wrote:
> >
> > Thanks Peter for the input.
> >
> > I understand the valid concern of potential large diff if we only delete 
> > the `FlinkSource` in the flink 2.0 module. It does make the diff across 
> > flink versions's modules unusable. For backporting, I thought you had a 
> > process that doesn't depend on the diff of directories (apply patches or 
> > sth). Anyway, I am open to delete `FlinkSource` across all versions for the 
> > concern that Peter brought up.
> >
> >
> > On Wed, Mar 12, 2025 at 11:55 PM Péter Váry <peter.vary.apa...@gmail.com> 
> > wrote:
> >>
> >> I agree with the general sentiment, that we should remove FlinkSource, but 
> >> only deprecate FlinkSink.
> >>
> >> I understand Steven's approach to remove the deprecated classes only from 
> >> the latest Flink version, but keep them for the old versions, but I have 
> >> some concerns with this.
> >>
> >> My main concern is that if we remove all the code related to the old 
> >> FlinkSource from the latent Flink version (production code and test 
> >> cleanup too), then the different Flink versions will have very  different 
> >> code bases. This will make the backports to older Flink versions painful 
> >> until the full deprecation.
> >>
> >> I see the following possibilities:
> >> 1. Bite the bullet, and accept the backport complexity increase
> >> 2. Remove only the main class, and remove other classes only at full 
> >> deprecation
> >> 3. After surveying the userbase deprecate the code for every versions
> >>
> >> Since the FlinkSource is deprecated for a long time, I would aim for 
> >> conserving development resources and chose option 3, unless there are 
> >> objections from the userbase.
> >>
> >> On Wed, Mar 12, 2025, 18:45 Rodrigo Meneses <rmene...@gmail.com> wrote:
> >>>
> >>> Once we deprecate FlinkSink, we should also upgrade IcebergSink from 
> >>> `Experimental` to `PublicEvolving`
> >>>
> >>>
> >>> On Wed, Mar 12, 2025 at 10:01 AM Steven Wu <stevenz...@gmail.com> wrote:
> >>>>
> >>>> Max, thanks for confirming the Flink 2.0 behavior. With the latest 
> >>>> information, here are my suggestions
> >>>>
> >>>> 1) I am fine with deleting the old `FlinkSource` in the upcoming flink 
> >>>> 2.0 module (but keeping them in flink 1.x modules). As discussed in a 
> >>>> previous thread, the new FLIP-27 source has been baked by users for 2 
> >>>> years. We already switched SQL to the new FLIP-27 source in the Iceberg 
> >>>> 1.7 release. So far so good.
> >>>>
> >>>> 2) for FlinkSink, I would prefer to keep it around with flink 2.0 
> >>>> module. Here are my reasons.
> >>>> - new sink is still relatively young. Give it more time to bake and 
> >>>> mature.
> >>>> - Flink sink (for streaming ingestion) is probably a lot more widely 
> >>>> used than Flink source (for batch or streaming read). Hence, it is 
> >>>> valuable to be more conservative in the deprecation plan here.
> >>>> - we can stop adding new features to the old FlinkSink and only 
> >>>> contribute to the new IcebergSink. that should reduce the maintenance 
> >>>> burden.
> >>>> - we can mark the FinkSink as deprecated for all flink versions once the 
> >>>> parity is achieved.
> >>>>
> >>>>
> >>>> On Wed, Mar 12, 2025 at 7:09 AM Maximilian Michels <m...@apache.org> 
> >>>> wrote:
> >>>>>
> >>>>> Hi JB,
> >>>>>
> >>>>> V3 is another topic, but it would be good to start looking into that.
> >>>>>
> >>>>> Hey Steven,
> >>>>>
> >>>>> Flink 2.0 actually did not remove the old interfaces, which was not
> >>>>> what I expected when I started this conversation. Turns out, old
> >>>>> interfaces have been moved to ".legacy" packages. I'm in the process
> >>>>> of verifying they will continue to work as expected.
> >>>>>
> >>>>> It looks like we will still have the option to keep around the old
> >>>>> source / sink implementations. However, I think it still makes sense
> >>>>> to remove FlinkSource and FlinkSink in the Flink 2.0 code path. This
> >>>>> will simplify the code base and avoid confusion for users in the long
> >>>>> run. I don't see real value in bringing over legacy sources / sinks to
> >>>>> a new Flink major release.
> >>>>>
> >>>>> -Max
> >>>>>
> >>>>> On Tue, Mar 11, 2025 at 10:46 PM Steven Wu <stevenz...@gmail.com> wrote:
> >>>>> >
> >>>>> > I assume Flink 2.0 will remove the old source and sink interfaces.
> >>>>> >
> >>>>> > With the above assumption (please correct me if I am wrong), here is 
> >>>>> > what I would suggest for the next Iceberg release (1.9 or whichever 
> >>>>> > version that lands the Flink 2.0 support).
> >>>>> > - Flink 2.0 module - remove the old source and sink impls. we have no 
> >>>>> > choice here.
> >>>>> > - Flink 1.x modules - update the deprecation note for old 
> >>>>> > `FlinkSource` that will be removed in Flink 2.0 support. add a 
> >>>>> > similar deprecation node for old `FlinkSink`.
> >>>>> >
> >>>>> > We shouldn't remove the old `FlinkSource` and `FlinkSink` for the 
> >>>>> > Flink 1.x modules. Allow users to continue to use them with Flink 
> >>>>> > 1.x. Note that Iceberg supports 3 active Flink versions (currently 
> >>>>> > 1.18, 1.19, 1.20). So old versions will be naturally retired over 
> >>>>> > time. As Flink progresses with 2.0, 2.1, 2.2 (etc.), we may want to 
> >>>>> > keep the support for the last Flink 1.x version a little longer based 
> >>>>> > on the community feedback.
> >>>>> >
> >>>>> > Thanks,
> >>>>> > Steven
> >>>>> >
> >>>>> > On Tue, Mar 11, 2025 at 8:48 AM Jean-Baptiste Onofré 
> >>>>> > <j...@nanthrax.net> wrote:
> >>>>> >>
> >>>>> >> Hi Max
> >>>>> >>
> >>>>> >> Yes, it makes sense. Maybe I can help on that as I'm working on Spec
> >>>>> >> V3 support in Flink (I did the default value, and I'm checking the 
> >>>>> >> new
> >>>>> >> types now).
> >>>>> >>
> >>>>> >> Please let me know :)
> >>>>> >>
> >>>>> >> Thanks,
> >>>>> >> Regards
> >>>>> >> JB
> >>>>> >>
> >>>>> >> On Tue, Mar 11, 2025 at 11:23 AM Maximilian Michels 
> >>>>> >> <m...@apache.org> wrote:
> >>>>> >> >
> >>>>> >> > Hi JB,
> >>>>> >> >
> >>>>> >> > Yes, Flink 2.0 support ideally should land in Iceberg 1.9.
> >>>>> >> >
> >>>>> >> > Hi Rodrigo,
> >>>>> >> >
> >>>>> >> > +1 on closing the gap between the two sink implementations: 
> >>>>> >> > FlinkSink
> >>>>> >> > and IcebergSink. Thanks for helping to close that gap!
> >>>>> >> >
> >>>>> >> > Hi Steven,
> >>>>> >> >
> >>>>> >> > Good point on the "Remove by Iceberg 2.0" claim stated in 
> >>>>> >> > FlinkSource.
> >>>>> >> > We can keep it until Iceberg 2.0 in the Flink 1.X paths, but I 
> >>>>> >> > would
> >>>>> >> > propose to remove FlinkSource earlier than Iceberg 2.0 for the 
> >>>>> >> > Flink
> >>>>> >> > 2.X path. The reason is that it is not used by default for many
> >>>>> >> > Iceberg versions. I think it is unlikely that users jumping on 
> >>>>> >> > Flink
> >>>>> >> > 2.0 will want to use it.
> >>>>> >> >
> >>>>> >> > As for removing FlinkSink, you're right that IcebergSink is 
> >>>>> >> > relatively
> >>>>> >> > new as a replacement. If we can keep it in the process of porting 
> >>>>> >> > the
> >>>>> >> > code, we may keep it around longer, but I suggest deprecating it 
> >>>>> >> > and
> >>>>> >> > scheduling it for removal in the Iceberg release following 1.9.
> >>>>> >> >
> >>>>> >> > To summarize, I'm proposing the following:
> >>>>> >> >
> >>>>> >> > Iceberg 1.9
> >>>>> >> > - Remove FlinkSource in Flink 2.0 code path
> >>>>> >> > - Deprecate FlinkSink for all supported Flink versions
> >>>>> >> >
> >>>>> >> > Iceberg > 1.9
> >>>>> >> > - Remove FlinkSource for all supported Flink versions
> >>>>> >> > - Remove FlinkSink for all supported Flink versions
> >>>>> >> >
> >>>>> >> > Does that sound right?
> >>>>> >> >
> >>>>> >> > Cheers,
> >>>>> >> > Max
> >>>>> >> >
> >>>>> >> > On Thu, Mar 6, 2025 at 9:10 PM Steven Wu <stevenz...@gmail.com> 
> >>>>> >> > wrote:
> >>>>> >> > >
> >>>>> >> > > > if Flink 2.0 release is going to release the old source and 
> >>>>> >> > > > sink interfaces, t
> >>>>> >> > >
> >>>>> >> > > typo above: "release the old" -> "delete the old"
> >>>>> >> > >
> >>>>> >> > > On Thu, Mar 6, 2025 at 12:08 PM Steven Wu <stevenz...@gmail.com> 
> >>>>> >> > > wrote:
> >>>>> >> > >>
> >>>>> >> > >> There was a previous thread for the Flink source. The Javadoc 
> >>>>> >> > >> deprecation note for the old `FlinkSource` currently says it 
> >>>>> >> > >> will be removed in Iceberg 2.0 release
> >>>>> >> > >> https://lists.apache.org/thread/27kcvo3p86pysk9wrggq4vphzo03sv3l
> >>>>> >> > >>
> >>>>> >> > >> Now regarding deprecating and removing the old `FlinkSink` in 
> >>>>> >> > >> favor of the new `IcebergSink`, the new `IcebergSink` was added 
> >>>>> >> > >> 6 months ago and released in Iceberg 1.7 (current release is 
> >>>>> >> > >> 1.8). Not sure how many users have got a chance to use it. 
> >>>>> >> > >> Ideally, I would like to have the new `IcebergSink` bake longer 
> >>>>> >> > >> with more users running it in the production environments. 
> >>>>> >> > >> There is also the parity problems that Rod mentioned.
> >>>>> >> > >>
> >>>>> >> > >> However, if Flink 2.0 release is going to release the old 
> >>>>> >> > >> source and sink interfaces, then we will have no choice and 
> >>>>> >> > >> remove the old source and sink implementations earlier than we 
> >>>>> >> > >> originally planned/preferred.
> >>>>> >> > >>
> >>>>> >> > >>
> >>>>> >> > >> On Thu, Mar 6, 2025 at 8:40 AM Rodrigo Meneses 
> >>>>> >> > >> <rmene...@gmail.com> wrote:
> >>>>> >> > >>>
> >>>>> >> > >>> Hi Max, +1 on that too. IcebergSink has been hanging around 
> >>>>> >> > >>> for a while now. We want to make sure we have the same 
> >>>>> >> > >>> features, though:
> >>>>> >> > >>>
> >>>>> >> > >>> I’ve got https://github.com/apache/iceberg/pull/12071 that 
> >>>>> >> > >>> adds the Range Distribution Mode to IcebergSink. It still 
> >>>>> >> > >>> needs a few recent bug fixes and features backported, but it 
> >>>>> >> > >>> should be ready soon.
> >>>>> >> > >>>
> >>>>> >> > >>>
> >>>>> >> > >>>
> >>>>> >> > >>> On Thu, Mar 6, 2025 at 8:18 AM Jean-Baptiste Onofré 
> >>>>> >> > >>> <j...@nanthrax.net> wrote:
> >>>>> >> > >>>>
> >>>>> >> > >>>> Hi Max,
> >>>>> >> > >>>>
> >>>>> >> > >>>> I guess you are proposing to remove FlinkSink and the 
> >>>>> >> > >>>> corresponding
> >>>>> >> > >>>> FlinkSource as well.
> >>>>> >> > >>>>
> >>>>> >> > >>>> It makes to me as, I saw in the code, that both FlinkSink and
> >>>>> >> > >>>> FlinkSource are deprecated for a while.
> >>>>> >> > >>>>
> >>>>> >> > >>>> So, +1 to remove it.
> >>>>> >> > >>>>
> >>>>> >> > >>>> Are you planning this for 1.9.0 ?
> >>>>> >> > >>>>
> >>>>> >> > >>>> Regards
> >>>>> >> > >>>> JB
> >>>>> >> > >>>>
> >>>>> >> > >>>> On Thu, Mar 6, 2025 at 4:21 PM Maximilian Michels 
> >>>>> >> > >>>> <m...@apache.org> wrote:
> >>>>> >> > >>>> >
> >>>>> >> > >>>> > Hi,
> >>>>> >> > >>>> >
> >>>>> >> > >>>> > Today there are two Flink write connectors in Iceberg:
> >>>>> >> > >>>> >
> >>>>> >> > >>>> > 1. FlinkSink (original sink, based on Flink legacy 
> >>>>> >> > >>>> > interfaces)
> >>>>> >> > >>>> > 2. IcebergSink (newer version, based on modern Flink API)
> >>>>> >> > >>>> >
> >>>>> >> > >>>> > In terms of features, (1) is a subset of (2).
> >>>>> >> > >>>> >
> >>>>> >> > >>>> > I'm in the process of adding support for Flink 2.0. The 
> >>>>> >> > >>>> > interfaces
> >>>>> >> > >>>> > used for (1) have been deprecated for several Flink 
> >>>>> >> > >>>> > versions and are
> >>>>> >> > >>>> > removed / discouraged in Flink 2.0.
> >>>>> >> > >>>> >
> >>>>> >> > >>>> > Therefore, I would like to propose to remove FlinkSink for 
> >>>>> >> > >>>> > the Flink
> >>>>> >> > >>>> > 2.0 Iceberg module. We have already deprecated FlinkSink 
> >>>>> >> > >>>> > for a while.
> >>>>> >> > >>>> >
> >>>>> >> > >>>> > Any objections?
> >>>>> >> > >>>> >
> >>>>> >> > >>>> > -Max

Reply via email to