Hi Robert,

I think it would be a helpful simplification of Flink's build setup if we
can get rid of flink-shaded-hadoop. Moreover relying only on the vanilla
Hadoop dependencies for the modules which interact with Hadoop/Yarn sounds
like a good idea to me.

Adding support for Hadoop 3 would also be nice. I'm not sure, though, how
Hadoop's API's have changed between 2 and 3. It might be necessary to
introduce some bridges in order to make it work.

Cheers,
Till

On Tue, Apr 21, 2020 at 4:37 PM Robert Metzger <rmetz...@apache.org> wrote:

> Hi all,
>
> for the upcoming 1.11 release, I started looking into adding support for
> Hadoop 3[1] for Flink. I have explored a little bit already into adding a
> shaded hadoop 3 into “flink-shaded”, and some mechanisms for switching
> between Hadoop 2 and 3 dependencies in the Flink build.
>
> However, Chesnay made me aware that we could also go a different route: We
> let Flink depend on vanilla Hadoop dependencies and stop providing shaded
> fat jars for Hadoop through “flink-shaded”.
>
> Why?
> - Maintaining properly shaded Hadoop fat jars is a lot of work (we have
> insufficient test coverage for all kinds of Hadoop features)
> - For Hadoop 2, there are already some known and unresolved issues with our
> shaded jars that we didn’t manage to fix
>
> Users will have to use Flink with Hadoop by relying on vanilla or
> vendor-provided Hadoop dependencies.
>
> What do you think?
>
> Best,
> Robert
>
> [1] https://issues.apache.org/jira/browse/FLINK-11086
>

Reply via email to