Hi David, I think we haven't updated our Hadoop dependencies in a long time. Hence, it is probably time to do so. So +1 for upgrading to the latest patch release.
If newer 2.x Hadoop versions are compatible with 2.y with x >= y, then I don't see a problem with dropping support for pre-bundled Hadoop versions < 2.8. This could indeed help us decrease our build matrix a bit and, thus, saving some build time. Concerning simplifying our code base to get rid of reflection logic etc. we still might have to add a safeguard for features that are not supported by earlier versions. According to the docs > YARN applications that attempt to use new APIs (including new fields in data structures) that have not yet been deployed to the cluster can expect link exceptions we can see link exceptions. We could get around this by saying that Flink no longer supports Hadoop < 2.8. But this should be checked with our users on the user ML at least. Cheers, Till On Tue, Dec 14, 2021 at 9:25 AM David Morávek <d...@apache.org> wrote: > Hi, > > I'd like to start a discussion about upgrading a minimal Hadoop version > that Flink supports. > > Even though the default value for `hadoop.version` property is set to > 2.8.3, we're still ensuring both runtime and compile compatibility with > Hadoop 2.4.x with the scheduled pipeline[1]. > > Here is list of dates of the latest releases for each minor version up to > 2.8.x > > - Hadoop 2.4.1: Last commit on 6/30/2014 > - Hadoop 2.5.2: Last commit on 11/15/2014 > - Hadoop 2.6.5: Last commit on 10/11/2016 > - Hadoop 2.7.7: Last commit on 7/18/2018 > - Hadoop 2.8.5: Last commit on 9/8/2018 > > Since then there were two more minor releases in 2.x branch and four more > minor releases in 3.x branch. > > Supporting the older version involves reflection-based "hacks" for > supporting multiple versions. > > My proposal would be changing the minimum supported version *to 2.8.5*. > This should simplify the hadoop related codebase and simplify the CI build > infrastructure as we won't have to test for the older versions. > > Please note that this only involves a minimal *client side* compatibility. > The wire protocol should remain compatible with earlier versions [2], so we > should be able to talk with any servers in 2.x major branch. > > One small note for the 2.8.x branch, some of the classes we need are only > available in 2.8.4 version and above, but I'm not sure we should take an > eventual need for upgrading a patch version into consideration here, > because both 2.8.4 and 2.8.5 are pretty old. > > WDYT, is it already time to upgrade? Looking forward for any thoughts on > the topic! > > [1] > > https://github.com/apache/flink/blob/release-1.14.0/tools/azure-pipelines/build-apache-repo.yml#L123 > [2] > > https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility > > Best, > D. >