Hi David,

Thanks for bringing this up for discussion! Given that Hadoop 2.8 is
considered EOL, shouldn't we bump the version to Hadoop 2.10? [1]

Best regards,

Martijn

[1]
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Active+Release+Lines

On Tue, 14 Dec 2021 at 10:28, Till Rohrmann <trohrm...@apache.org> wrote:

> Hi David,
>
> I think we haven't updated our Hadoop dependencies in a long time. Hence,
> it is probably time to do so. So +1 for upgrading to the latest patch
> release.
>
> If newer 2.x Hadoop versions are compatible with 2.y with x >= y, then I
> don't see a problem with dropping support for pre-bundled Hadoop versions <
> 2.8. This could indeed help us decrease our build matrix a bit and, thus,
> saving some build time.
>
> Concerning simplifying our code base to get rid of reflection logic etc. we
> still might have to add a safeguard for features that are not supported by
> earlier versions. According to the docs
>
> > YARN applications that attempt to use new APIs (including new fields in
> data structures) that have not yet been deployed to the cluster can expect
> link exceptions
>
> we can see link exceptions. We could get around this by saying that Flink
> no longer supports Hadoop < 2.8. But this should be checked with our users
> on the user ML at least.
>
> Cheers,
> Till
>
> On Tue, Dec 14, 2021 at 9:25 AM David Morávek <d...@apache.org> wrote:
>
> > Hi,
> >
> > I'd like to start a discussion about upgrading a minimal Hadoop version
> > that Flink supports.
> >
> > Even though the default value for `hadoop.version` property is set to
> > 2.8.3, we're still ensuring both runtime and compile compatibility with
> > Hadoop 2.4.x with the scheduled pipeline[1].
> >
> > Here is list of dates of the latest releases for each minor version up to
> > 2.8.x
> >
> > - Hadoop 2.4.1: Last commit on 6/30/2014
> > - Hadoop 2.5.2: Last commit on 11/15/2014
> > - Hadoop 2.6.5: Last commit on 10/11/2016
> > - Hadoop 2.7.7: Last commit on 7/18/2018
> > - Hadoop 2.8.5: Last commit on 9/8/2018
> >
> > Since then there were two more minor releases in 2.x branch and four more
> > minor releases in 3.x branch.
> >
> > Supporting the older version involves reflection-based "hacks" for
> > supporting multiple versions.
> >
> > My proposal would be changing the minimum supported version *to 2.8.5*.
> > This should simplify the hadoop related codebase and simplify the CI
> build
> > infrastructure as we won't have to test for the older versions.
> >
> > Please note that this only involves a minimal *client side*
> compatibility.
> > The wire protocol should remain compatible with earlier versions [2], so
> we
> > should be able to talk with any servers in 2.x major branch.
> >
> > One small note for the 2.8.x branch, some of the classes we need are only
> > available in 2.8.4 version and above, but I'm not sure we should take an
> > eventual need for upgrading a patch version into consideration here,
> > because both 2.8.4 and 2.8.5 are pretty old.
> >
> > WDYT, is it already time to upgrade? Looking forward for any thoughts on
> > the topic!
> >
> > [1]
> >
> >
> https://github.com/apache/flink/blob/release-1.14.0/tools/azure-pipelines/build-apache-repo.yml#L123
> > [2]
> >
> >
> https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility
> >
> > Best,
> > D.
> >
>

Reply via email to