Hi Robert, I think it would be a helpful simplification of Flink's build setup if we can get rid of flink-shaded-hadoop. Moreover relying only on the vanilla Hadoop dependencies for the modules which interact with Hadoop/Yarn sounds like a good idea to me.
Adding support for Hadoop 3 would also be nice. I'm not sure, though, how Hadoop's API's have changed between 2 and 3. It might be necessary to introduce some bridges in order to make it work. Cheers, Till On Tue, Apr 21, 2020 at 4:37 PM Robert Metzger <rmetz...@apache.org> wrote: > Hi all, > > for the upcoming 1.11 release, I started looking into adding support for > Hadoop 3[1] for Flink. I have explored a little bit already into adding a > shaded hadoop 3 into “flink-shaded”, and some mechanisms for switching > between Hadoop 2 and 3 dependencies in the Flink build. > > However, Chesnay made me aware that we could also go a different route: We > let Flink depend on vanilla Hadoop dependencies and stop providing shaded > fat jars for Hadoop through “flink-shaded”. > > Why? > - Maintaining properly shaded Hadoop fat jars is a lot of work (we have > insufficient test coverage for all kinds of Hadoop features) > - For Hadoop 2, there are already some known and unresolved issues with our > shaded jars that we didn’t manage to fix > > Users will have to use Flink with Hadoop by relying on vanilla or > vendor-provided Hadoop dependencies. > > What do you think? > > Best, > Robert > > [1] https://issues.apache.org/jira/browse/FLINK-11086 >