Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

Bowen Li Tue, 17 Dec 2019 09:50:37 -0800

I'm not sure providing an uber jar would be possible.

Different from kafka and elasticsearch connector who have dependencies for
a specific kafka/elastic version, or the kafka universal connector that
provides good compatibilities, hive connector needs to deal with hive jars
in all 1.x, 2.x, 3.x versions (let alone all the HDP/CDH distributions)
with incompatibility even between minor versions, different versioned
hadoop and other extra dependency jars for each hive version.

Besides, users usually need to be able to easily see which individual jars
are required, which is invisible from an uber jar. Hive users already have
their hive deployments. They usually have to use their own hive jars
because, unlike hive jars on mvn, their own jars contain changes in-house
or from vendors. They need to easily tell which jars Flink requires for
corresponding open sourced hive version to their own hive deployment, and
copy in-hosue jars over from hive deployments as replacements.

Providing a script to download all the individual jars for a specified hive
version can be an alternative.

The goal is we need to provide a *product*, not a technology, to make it
less hassle for Hive users. Afterall, it's Flink embracing Hive community
and ecosystem, not the other way around. I'd argue Hive connector can be
treat differently because its community/ecosystem/userbase is much larger
than the other connectors, and it's way more important than other
connectors to Flink on the mission of becoming a batch/streaming unified
engine and get Flink more widely adopted.

On Sun, Dec 15, 2019 at 10:03 PM Danny Chan <yuzhao....@gmail.com> wrote:

> Also -1 on separate builds.
>
> After referencing some other BigData engines for distribution[1], i didn't
> find strong needs to publish a separate build
> for just a separate Hive version, indeed there are builds for different
> Hadoop version.
>
> Just like Seth and Aljoscha said, we could push a
> flink-hive-version-uber.jar to use as a lib of SQL-CLI or other use cases.
>
> [1] https://spark.apache.org/downloads.html
> [2] https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html
>
> Best,
> Danny Chan
> 在 2019年12月14日 +0800 AM3:03，dev@flink.apache.org，写道：
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#dependencies
>

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

Reply via email to