@Konstantin
It is part a) since it is not used by all hadoop users (as they may be
using their existing hadoop infrastructure or one provided by the cloud
service), but I'd say it's mostly for maintenance reasons.
The reality is that we cannot truly maintain flink-shaded-hadoop.
There are too
Hey Robert and others,
overall +1 to support Hadoop 3. It would be a great to unblock Flink
support in EMR 6.0 as noted in the linked FLINK ticket.
The arguments raised against flink-shaded-hadoop make sense to me. I have a
few general questions still:
1) Will the flink-shaded-hadoop module (in
Hi Chesnay, Hi Robert,
I have a bit of a naive question. I assume the reason for introducing
flink-shaded-hadoop were dependency conflicts between Hadoop, Flink and/or
user code. When we drop it now is it because
a) it was not worth it (value provided did not justify maintenance overhead
and issu
Indeed, that would be the assumption, that Hadoop does not expose its
transitive libraries on its public API surface.
>From vague memory, I think that pretty much true so far. I only remember
Kinesis and Calcite as counter examples, who exposed Guava classes as part
of the public API.
But that is
This would only work so long as all Hadoop APIs do not directly expose
any transitive non-hadoop dependency.
Otherwise the user code classloader might search for this transitive
dependency in lib instead of the hadoop classpath (and possibly not find
it).
On 23/04/2020 11:34, Stephan Ewen wrot
True, connectors built on Hadoop make this a bit more complex. That is also
the reason why Hadoop is on the "parent first" patterns.
Maybe this is a bit of a wild thought, but what would happen if we had a
"first class" notion of a Hadoop Classloader in the system, and the user
code classloader wo
although, if you can load the HADOOP_CLASSPATH as a plugin, then you can
also load it in the user-code classloader.
On 23/04/2020 10:50, Chesnay Schepler wrote:
@Stephan I'm not aware of anyone having tried that; possibly since we
have various connectors that require hadoop (hadoop-compat, hive
@Stephan I'm not aware of anyone having tried that; possibly since we
have various connectors that require hadoop (hadoop-compat, hive,
orc/parquet/hbase, hadoop inputformats). This would require connectors
to be loaded as plugins (or having access to the plugin classloader) to
be feasible.
O
Hi all!
+1 for the simplification of dropping hadoop-shaded
Have we ever investigated how much work it would be to load the
HADOOP_CLASSPATH through the plugin loader? Then Hadoop's crazy dependency
footprint would not spoil the main classpath.
- HDFS might be very simple, because file system
+1 for supporting Hadoop 3.
I'm not familiar with the shading efforts, thus no comment on dropping the
flink-shaded-hadoop.
Correct me if I'm wrong. Despite currently the default Hadoop version for
compiling is 2.4.1 in Flink, I think this does not mean Flink should
support only Hadoop 2.4+. So
1) Likely not, as this again introduces a hard-dependency on
flink-shaded-hadoop.
2) Indeed; this will be something the user/cloud providers have to deal
with now.
3) Yes.
As a small note, we can still keep the hadoop-2 version of flink-shaded
around for existing users.
What I suggested was t
Thanks Robert for starting this significant discussion.
Since hadoop3 has been released for long time and many companies have
already
put it in production. No matter you are using flink-shaded-hadoop2 or not,
currently
Flink could already run in yarn3(not sure about HDFS). Since the yarn api
is al
I agree with Aljoscha. Otherwise I can see a lot of tickets getting created
saying the application is not running on YARN.
Cheers,
Sivaprasanna
On Wed, Apr 22, 2020 at 1:00 PM Aljoscha Krettek
wrote:
> +1 to getting rid of flink-shaded-hadoop. But we need to document how
> people can now get a
+1 to getting rid of flink-shaded-hadoop. But we need to document how
people can now get a Flink dist that works with Hadoop. Currently, when
you download the single shaded jar you immediately get support for
submitting to YARN via bin/flink run.
Aljoscha
On 22.04.20 09:08, Till Rohrmann wro
Hi Robert,
I think it would be a helpful simplification of Flink's build setup if we
can get rid of flink-shaded-hadoop. Moreover relying only on the vanilla
Hadoop dependencies for the modules which interact with Hadoop/Yarn sounds
like a good idea to me.
Adding support for Hadoop 3 would also b
Hello,
I think that's a good idea. I suppose that most corporate users use a vendor
version and already compile their flink version with vendor repos anyway.
Arnaud
PS - FYI, for CDH6 (based on hadoop3), I've managed to build a 1.10 version by
modifying pom.xml files and using "hidden" Cloudera p
16 matches
Mail list logo