Hi David, Thanks for bringing this up for discussion! Given that Hadoop 2.8 is considered EOL, shouldn't we bump the version to Hadoop 2.10? [1]
Best regards, Martijn [1] https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Active+Release+Lines On Tue, 14 Dec 2021 at 10:28, Till Rohrmann <trohrm...@apache.org> wrote: > Hi David, > > I think we haven't updated our Hadoop dependencies in a long time. Hence, > it is probably time to do so. So +1 for upgrading to the latest patch > release. > > If newer 2.x Hadoop versions are compatible with 2.y with x >= y, then I > don't see a problem with dropping support for pre-bundled Hadoop versions < > 2.8. This could indeed help us decrease our build matrix a bit and, thus, > saving some build time. > > Concerning simplifying our code base to get rid of reflection logic etc. we > still might have to add a safeguard for features that are not supported by > earlier versions. According to the docs > > > YARN applications that attempt to use new APIs (including new fields in > data structures) that have not yet been deployed to the cluster can expect > link exceptions > > we can see link exceptions. We could get around this by saying that Flink > no longer supports Hadoop < 2.8. But this should be checked with our users > on the user ML at least. > > Cheers, > Till > > On Tue, Dec 14, 2021 at 9:25 AM David Morávek <d...@apache.org> wrote: > > > Hi, > > > > I'd like to start a discussion about upgrading a minimal Hadoop version > > that Flink supports. > > > > Even though the default value for `hadoop.version` property is set to > > 2.8.3, we're still ensuring both runtime and compile compatibility with > > Hadoop 2.4.x with the scheduled pipeline[1]. > > > > Here is list of dates of the latest releases for each minor version up to > > 2.8.x > > > > - Hadoop 2.4.1: Last commit on 6/30/2014 > > - Hadoop 2.5.2: Last commit on 11/15/2014 > > - Hadoop 2.6.5: Last commit on 10/11/2016 > > - Hadoop 2.7.7: Last commit on 7/18/2018 > > - Hadoop 2.8.5: Last commit on 9/8/2018 > > > > Since then there were two more minor releases in 2.x branch and four more > > minor releases in 3.x branch. > > > > Supporting the older version involves reflection-based "hacks" for > > supporting multiple versions. > > > > My proposal would be changing the minimum supported version *to 2.8.5*. > > This should simplify the hadoop related codebase and simplify the CI > build > > infrastructure as we won't have to test for the older versions. > > > > Please note that this only involves a minimal *client side* > compatibility. > > The wire protocol should remain compatible with earlier versions [2], so > we > > should be able to talk with any servers in 2.x major branch. > > > > One small note for the 2.8.x branch, some of the classes we need are only > > available in 2.8.4 version and above, but I'm not sure we should take an > > eventual need for upgrading a patch version into consideration here, > > because both 2.8.4 and 2.8.5 are pretty old. > > > > WDYT, is it already time to upgrade? Looking forward for any thoughts on > > the topic! > > > > [1] > > > > > https://github.com/apache/flink/blob/release-1.14.0/tools/azure-pipelines/build-apache-repo.yml#L123 > > [2] > > > > > https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility > > > > Best, > > D. > > >