Hi Martijn, from person experience, most Hadoop users are lagging behind the release lines by a lot, because upgrading a Hadoop cluster is not really a simply task to achieve. I think for now, we can stay a bit conservative, nothing blocks us for using 2.8.5 as we don't use any "newer" APIs in the code.
As for Till's concern, we can still wrap the reflection based logic, to be skipped in case of "NoClassDefFound" instead of "ClassNotFound" as we do now. D. On Tue, Dec 14, 2021 at 5:23 PM Martijn Visser <[email protected]> wrote: > Hi David, > > Thanks for bringing this up for discussion! Given that Hadoop 2.8 is > considered EOL, shouldn't we bump the version to Hadoop 2.10? [1] > > Best regards, > > Martijn > > [1] > > https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Active+Release+Lines > > On Tue, 14 Dec 2021 at 10:28, Till Rohrmann <[email protected]> wrote: > > > Hi David, > > > > I think we haven't updated our Hadoop dependencies in a long time. Hence, > > it is probably time to do so. So +1 for upgrading to the latest patch > > release. > > > > If newer 2.x Hadoop versions are compatible with 2.y with x >= y, then I > > don't see a problem with dropping support for pre-bundled Hadoop > versions < > > 2.8. This could indeed help us decrease our build matrix a bit and, thus, > > saving some build time. > > > > Concerning simplifying our code base to get rid of reflection logic etc. > we > > still might have to add a safeguard for features that are not supported > by > > earlier versions. According to the docs > > > > > YARN applications that attempt to use new APIs (including new fields in > > data structures) that have not yet been deployed to the cluster can > expect > > link exceptions > > > > we can see link exceptions. We could get around this by saying that Flink > > no longer supports Hadoop < 2.8. But this should be checked with our > users > > on the user ML at least. > > > > Cheers, > > Till > > > > On Tue, Dec 14, 2021 at 9:25 AM David Morávek <[email protected]> wrote: > > > > > Hi, > > > > > > I'd like to start a discussion about upgrading a minimal Hadoop version > > > that Flink supports. > > > > > > Even though the default value for `hadoop.version` property is set to > > > 2.8.3, we're still ensuring both runtime and compile compatibility with > > > Hadoop 2.4.x with the scheduled pipeline[1]. > > > > > > Here is list of dates of the latest releases for each minor version up > to > > > 2.8.x > > > > > > - Hadoop 2.4.1: Last commit on 6/30/2014 > > > - Hadoop 2.5.2: Last commit on 11/15/2014 > > > - Hadoop 2.6.5: Last commit on 10/11/2016 > > > - Hadoop 2.7.7: Last commit on 7/18/2018 > > > - Hadoop 2.8.5: Last commit on 9/8/2018 > > > > > > Since then there were two more minor releases in 2.x branch and four > more > > > minor releases in 3.x branch. > > > > > > Supporting the older version involves reflection-based "hacks" for > > > supporting multiple versions. > > > > > > My proposal would be changing the minimum supported version *to 2.8.5*. > > > This should simplify the hadoop related codebase and simplify the CI > > build > > > infrastructure as we won't have to test for the older versions. > > > > > > Please note that this only involves a minimal *client side* > > compatibility. > > > The wire protocol should remain compatible with earlier versions [2], > so > > we > > > should be able to talk with any servers in 2.x major branch. > > > > > > One small note for the 2.8.x branch, some of the classes we need are > only > > > available in 2.8.4 version and above, but I'm not sure we should take > an > > > eventual need for upgrading a patch version into consideration here, > > > because both 2.8.4 and 2.8.5 are pretty old. > > > > > > WDYT, is it already time to upgrade? Looking forward for any thoughts > on > > > the topic! > > > > > > [1] > > > > > > > > > https://github.com/apache/flink/blob/release-1.14.0/tools/azure-pipelines/build-apache-repo.yml#L123 > > > [2] > > > > > > > > > https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility > > > > > > Best, > > > D. > > > > > >
