Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

2020-05-04 Thread Chesnay Schepler
@Konstantin It is part a) since it is not used by all hadoop users (as they may be using their existing hadoop infrastructure or one provided by the cloud service), but I'd say it's mostly for maintenance reasons. The reality is that we cannot truly maintain flink-shaded-hadoop. There are too

Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

2020-05-04 Thread Ufuk Celebi
Hey Robert and others, overall +1 to support Hadoop 3. It would be a great to unblock Flink support in EMR 6.0 as noted in the linked FLINK ticket. The arguments raised against flink-shaded-hadoop make sense to me. I have a few general questions still: 1) Will the flink-shaded-hadoop module (in

Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

2020-05-03 Thread Konstantin Knauf
Hi Chesnay, Hi Robert, I have a bit of a naive question. I assume the reason for introducing flink-shaded-hadoop were dependency conflicts between Hadoop, Flink and/or user code. When we drop it now is it because a) it was not worth it (value provided did not justify maintenance overhead and issu

Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

2020-04-26 Thread Stephan Ewen
Indeed, that would be the assumption, that Hadoop does not expose its transitive libraries on its public API surface. >From vague memory, I think that pretty much true so far. I only remember Kinesis and Calcite as counter examples, who exposed Guava classes as part of the public API. But that is

Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

2020-04-23 Thread Chesnay Schepler
This would only work so long as all Hadoop APIs do not directly expose any transitive non-hadoop dependency. Otherwise the user code classloader might search for this transitive dependency in lib instead of the hadoop classpath (and possibly not find it). On 23/04/2020 11:34, Stephan Ewen wrot

Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

2020-04-23 Thread Stephan Ewen
True, connectors built on Hadoop make this a bit more complex. That is also the reason why Hadoop is on the "parent first" patterns. Maybe this is a bit of a wild thought, but what would happen if we had a "first class" notion of a Hadoop Classloader in the system, and the user code classloader wo

Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

2020-04-23 Thread Chesnay Schepler
although, if you can load the HADOOP_CLASSPATH as a plugin, then you can also load it in the user-code classloader. On 23/04/2020 10:50, Chesnay Schepler wrote: @Stephan I'm not aware of anyone having tried that; possibly since we have various connectors that require hadoop (hadoop-compat, hive

Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

2020-04-23 Thread Chesnay Schepler
@Stephan I'm not aware of anyone having tried that; possibly since we have various connectors that require hadoop (hadoop-compat, hive, orc/parquet/hbase, hadoop inputformats). This would require connectors to be loaded as plugins (or having access to the plugin classloader) to be feasible. O

Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

2020-04-23 Thread Stephan Ewen
Hi all! +1 for the simplification of dropping hadoop-shaded Have we ever investigated how much work it would be to load the HADOOP_CLASSPATH through the plugin loader? Then Hadoop's crazy dependency footprint would not spoil the main classpath. - HDFS might be very simple, because file system

Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

2020-04-22 Thread Xintong Song
+1 for supporting Hadoop 3. I'm not familiar with the shading efforts, thus no comment on dropping the flink-shaded-hadoop. Correct me if I'm wrong. Despite currently the default Hadoop version for compiling is 2.4.1 in Flink, I think this does not mean Flink should support only Hadoop 2.4+. So

Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

2020-04-22 Thread Chesnay Schepler
1) Likely not, as this again introduces a hard-dependency on flink-shaded-hadoop. 2) Indeed; this will be something the user/cloud providers have to deal with now. 3) Yes. As a small note, we can still keep the hadoop-2 version of flink-shaded around for existing users. What I suggested was t

Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

2020-04-22 Thread Yang Wang
Thanks Robert for starting this significant discussion. Since hadoop3 has been released for long time and many companies have already put it in production. No matter you are using flink-shaded-hadoop2 or not, currently Flink could already run in yarn3(not sure about HDFS). Since the yarn api is al

Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

2020-04-22 Thread Sivaprasanna
I agree with Aljoscha. Otherwise I can see a lot of tickets getting created saying the application is not running on YARN. Cheers, Sivaprasanna On Wed, Apr 22, 2020 at 1:00 PM Aljoscha Krettek wrote: > +1 to getting rid of flink-shaded-hadoop. But we need to document how > people can now get a

Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

2020-04-22 Thread Aljoscha Krettek
+1 to getting rid of flink-shaded-hadoop. But we need to document how people can now get a Flink dist that works with Hadoop. Currently, when you download the single shaded jar you immediately get support for submitting to YARN via bin/flink run. Aljoscha On 22.04.20 09:08, Till Rohrmann wro

Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

2020-04-22 Thread Till Rohrmann
Hi Robert, I think it would be a helpful simplification of Flink's build setup if we can get rid of flink-shaded-hadoop. Moreover relying only on the vanilla Hadoop dependencies for the modules which interact with Hadoop/Yarn sounds like a good idea to me. Adding support for Hadoop 3 would also b

Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

2020-04-21 Thread LINZ, Arnaud
Hello, I think that's a good idea. I suppose that most corporate users use a vendor version and already compile their flink version with vendor repos anyway. Arnaud PS - FYI, for CDH6 (based on hadoop3), I've managed to build a 1.10 version by modifying pom.xml files and using "hidden" Cloudera p