Hi,

I'd like to start a discussion about upgrading a minimal Hadoop version
that Flink supports.

Even though the default value for `hadoop.version` property is set to
2.8.3, we're still ensuring both runtime and compile compatibility with
Hadoop 2.4.x with the scheduled pipeline[1].

Here is list of dates of the latest releases for each minor version up to
2.8.x

- Hadoop 2.4.1: Last commit on 6/30/2014
- Hadoop 2.5.2: Last commit on 11/15/2014
- Hadoop 2.6.5: Last commit on 10/11/2016
- Hadoop 2.7.7: Last commit on 7/18/2018
- Hadoop 2.8.5: Last commit on 9/8/2018

Since then there were two more minor releases in 2.x branch and four more
minor releases in 3.x branch.

Supporting the older version involves reflection-based "hacks" for
supporting multiple versions.

My proposal would be changing the minimum supported version *to 2.8.5*.
This should simplify the hadoop related codebase and simplify the CI build
infrastructure as we won't have to test for the older versions.

Please note that this only involves a minimal *client side* compatibility.
The wire protocol should remain compatible with earlier versions [2], so we
should be able to talk with any servers in 2.x major branch.

One small note for the 2.8.x branch, some of the classes we need are only
available in 2.8.4 version and above, but I'm not sure we should take an
eventual need for upgrading a patch version into consideration here,
because both 2.8.4 and 2.8.5 are pretty old.

WDYT, is it already time to upgrade? Looking forward for any thoughts on
the topic!

[1]
https://github.com/apache/flink/blob/release-1.14.0/tools/azure-pipelines/build-apache-repo.yml#L123
[2]
https://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility

Best,
D.

Reply via email to