Re: [DISCUSS] Deprecate Hadoop source method from (batch) ExecutionEnvironment

Shannon Carey Fri, 14 Oct 2016 09:11:44 -0700

Yep!

From: Fabian Hueske <fhue...@gmail.com<mailto:fhue...@gmail.com>>
Date: Friday, October 14, 2016 at 11:00 AM
To: Shannon Carey <sca...@expedia.com<mailto:sca...@expedia.com>>
Cc: "user@flink.apache.org<mailto:user@flink.apache.org>" 
<user@flink.apache.org<mailto:user@flink.apache.org>>
Subject: Re: [DISCUSS] Deprecate Hadoop source method from (batch) 
ExecutionEnvironment


Hi Shannon,

the plan is as follows:

We will keep the methods as they are for 1.2 but deprecate them and at the same 
time we will add alternatives in an optional dependency.
In a later release, the deprecated methods will be removed and everybody has to 
switch to the optional dependency.

Does that work for you?

Best, Fabian

2016-10-14 17:30 GMT+02:00 Shannon Carey 
<sca...@expedia.com<mailto:sca...@expedia.com>>:
Speaking as a user, if you are suggesting that you will retain the 
functionality but move the methods to an optional dependency, it makes sense to 
me. We have used the Hadoop integration for AvroParquetInputFormat and 
CqlBulkOutputFormat in Flink (although we won't be using CqlBulkOutputFormat 
any longer because it doesn't seem to be reliable).

-Shannon

From: Fabian Hueske <fhue...@gmail.com<mailto:fhue...@gmail.com>>
Date: Friday, October 14, 2016 at 4:29 AM
To: <user@flink.apache.org<mailto:user@flink.apache.org>>, 
"d...@flink.apache.org<mailto:d...@flink.apache.org>" 
<d...@flink.apache.org<mailto:d...@flink.apache.org>>
Subject: [DISCUSS] Deprecate Hadoop source method from (batch) 
ExecutionEnvironment

Hi everybody,

I would like to propose to deprecate the utility methods to read data with 
Hadoop InputFormats from the (batch) ExecutionEnvironment.

The motivation for deprecating these methods is reduce Flink's dependency on 
Hadoop but rather have Hadoop as an optional dependency for users that actually 
need it (HDFS, MapRed-Compat, ...). Eventually, we want to have Flink 
distribution that does not have a hard Hadoop dependency.

One step for this is to remove the Hadoop dependency from flink-java (Flink's 
Java DataSet API) which is currently required due to the above utility methods 
(see FLINK-4315). We recently received a PR that addresses FLINK-4315 and 
removes the Hadoop methods from the ExecutionEnvironment. After some 
discussion, it was decided to defer the PR to Flink 2.0 because it breaks the 
API (these methods are delared @PublicEvolving).

I propose to accept this PR for Flink 1.2, but instead of removing the methods 
deprecating them.
This would help to migrate old code and prevent new usage of these methods.
For a later Flink release (1.3 or 2.0) we could remove these methods and the 
Hadoop dependency on flink-java.

What do others think?

Best, Fabian

Re: [DISCUSS] Deprecate Hadoop source method from (batch) ExecutionEnvironment

Reply via email to