I created a Jira for this: https://issues.apache.org/jira/browse/FLINK-2643
On Fri, 4 Sep 2015 at 13:01 Matthias J. Sax <mj...@apache.org> wrote: > +1 for dropping > > On 09/04/2015 11:04 AM, Maximilian Michels wrote: > > +1 for dropping Hadoop 2.2.0 binary and source-compatibility. The > > release is hardly used and complicates the important high-availability > > changes in Flink. > > > > On Fri, Sep 4, 2015 at 9:33 AM, Stephan Ewen <se...@apache.org> wrote: > >> I am good with that as well. Mind that we are not only dropping a binary > >> distribution for Hadoop 2.2.0, but also the source compatibility with > 2.2.0. > >> > >> > >> > >> Lets also reconfigure Travis to test > >> > >> - Hadoop1 > >> - Hadoop 2.3 > >> - Hadoop 2.4 > >> - Hadoop 2.6 > >> - Hadoop 2.7 > >> > >> > >> On Fri, Sep 4, 2015 at 6:19 AM, Chiwan Park <chiwanp...@apache.org> > wrote: > >>> > >>> +1 for dropping Hadoop 2.2.0 > >>> > >>> Regards, > >>> Chiwan Park > >>> > >>>> On Sep 4, 2015, at 5:58 AM, Ufuk Celebi <u...@apache.org> wrote: > >>>> > >>>> +1 to what Robert said. > >>>> > >>>> On Thursday, September 3, 2015, Robert Metzger <rmetz...@apache.org> > >>>> wrote: > >>>> I think most cloud providers moved beyond Hadoop 2.2.0. > >>>> Google's Click-To-Deploy is on 2.4.1 > >>>> AWS EMR is on 2.6.0 > >>>> > >>>> The situation for the distributions seems to be the following: > >>>> MapR 4 uses Hadoop 2.4.0 (current is MapR 5) > >>>> CDH 5.0 uses 2.3.0 (the current CDH release is 5.4) > >>>> > >>>> HDP 2.0 (October 2013) is using 2.2.0 > >>>> HDP 2.1 (April 2014) uses 2.4.0 already > >>>> > >>>> So both vendors and cloud providers are multiple releases away from > >>>> Hadoop 2.2.0. > >>>> > >>>> Spark does not offer a binary distribution lower than 2.3.0. > >>>> > >>>> In addition to that, I don't think that the HDFS client in 2.2.0 is > >>>> really usable in production environments. Users were reporting > >>>> ArrayIndexOutOfBounds exceptions for some jobs, I also had these > exceptions > >>>> sometimes. > >>>> > >>>> The easiest approach to resolve this issue would be (a) dropping the > >>>> support for Hadoop 2.2.0 > >>>> An alternative approach (b) would be: > >>>> - ship a binary version for Hadoop 2.3.0 > >>>> - make the source of Flink still compatible with 2.2.0, so that users > >>>> can compile a Hadoop 2.2.0 version if needed. > >>>> > >>>> I would vote for approach (a). > >>>> > >>>> > >>>> On Tue, Sep 1, 2015 at 5:01 PM, Till Rohrmann <trohrm...@apache.org> > >>>> wrote: > >>>> While working on high availability (HA) for Flink's YARN execution I > >>>> stumbled across some limitations with Hadoop 2.2.0. From version > 2.2.0 to > >>>> 2.3.0, Hadoop introduced new functionality which is required for an > >>>> efficient HA implementation. Therefore, I was wondering whether there > is > >>>> actually a need to support Hadoop 2.2.0. Is Hadoop 2.2.0 still > actively used > >>>> by someone? > >>>> > >>>> Cheers, > >>>> Till > >>>> > >>> > >>> > >>> > >>> > >>> > >> > >