+1 for dropping Hadoop 2.2.0 Regards, Chiwan Park
> On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <u...@apache.org> wrote: > > +1 to what Robert said. > > On Thursday, September 3, 2015, Robert Metzger <rmetz...@apache.org> wrote: > I think most cloud providers moved beyond Hadoop 2.2.0. > Google's Click-To-Deploy is on 2.4.1 > AWS EMR is on 2.6.0 > > The situation for the distributions seems to be the following: > MapR 4 uses Hadoop 2.4.0 (current is MapR 5) > CDH 5.0 uses 2.3.0 (the current CDH release is 5.4) > > HDP 2.0 (October 2013) is using 2.2.0 > HDP 2.1 (April 2014) uses 2.4.0 already > > So both vendors and cloud providers are multiple releases away from Hadoop > 2.2.0. > > Spark does not offer a binary distribution lower than 2.3.0. > > In addition to that, I don't think that the HDFS client in 2.2.0 is really > usable in production environments. Users were reporting ArrayIndexOutOfBounds > exceptions for some jobs, I also had these exceptions sometimes. > > The easiest approach to resolve this issue would be (a) dropping the > support for Hadoop 2.2.0 > An alternative approach (b) would be: > - ship a binary version for Hadoop 2.3.0 > - make the source of Flink still compatible with 2.2.0, so that users can > compile a Hadoop 2.2.0 version if needed. > > I would vote for approach (a). > > > On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <trohrm...@apache.org> wrote: > While working on high availability (HA) for Flink's YARN execution I stumbled > across some limitations with Hadoop 2.2.0. From version 2.2.0 to 2.3.0, > Hadoop introduced new functionality which is required for an efficient HA > implementation. Therefore, I was wondering whether there is actually a need > to support Hadoop 2.2.0. Is Hadoop 2.2.0 still actively used by someone? > > Cheers, > Till >