Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

Till Rohrmann Wed, 18 Dec 2019 00:44:18 -0800

Couldn't it simply be documented which jars are in the convenience jars
which are pre built and can be downloaded from the website? Then people who
need a custom version know which jars they need to provide to Flink?


Cheers,
Till

On Tue, Dec 17, 2019 at 6:49 PM Bowen Li <bowenl...@gmail.com> wrote:

> I'm not sure providing an uber jar would be possible.
>
> Different from kafka and elasticsearch connector who have dependencies for
> a specific kafka/elastic version, or the kafka universal connector that
> provides good compatibilities, hive connector needs to deal with hive jars
> in all 1.x, 2.x, 3.x versions (let alone all the HDP/CDH distributions)
> with incompatibility even between minor versions, different versioned
> hadoop and other extra dependency jars for each hive version.
>
> Besides, users usually need to be able to easily see which individual jars
> are required, which is invisible from an uber jar. Hive users already have
> their hive deployments. They usually have to use their own hive jars
> because, unlike hive jars on mvn, their own jars contain changes in-house
> or from vendors. They need to easily tell which jars Flink requires for
> corresponding open sourced hive version to their own hive deployment, and
> copy in-hosue jars over from hive deployments as replacements.
>
> Providing a script to download all the individual jars for a specified hive
> version can be an alternative.
>
> The goal is we need to provide a *product*, not a technology, to make it
> less hassle for Hive users. Afterall, it's Flink embracing Hive community
> and ecosystem, not the other way around. I'd argue Hive connector can be
> treat differently because its community/ecosystem/userbase is much larger
> than the other connectors, and it's way more important than other
> connectors to Flink on the mission of becoming a batch/streaming unified
> engine and get Flink more widely adopted.
>
>
> On Sun, Dec 15, 2019 at 10:03 PM Danny Chan <yuzhao....@gmail.com> wrote:
>
> > Also -1 on separate builds.
> >
> > After referencing some other BigData engines for distribution[1], i
> didn't
> > find strong needs to publish a separate build
> > for just a separate Hive version, indeed there are builds for different
> > Hadoop version.
> >
> > Just like Seth and Aljoscha said, we could push a
> > flink-hive-version-uber.jar to use as a lib of SQL-CLI or other use
> cases.
> >
> > [1] https://spark.apache.org/downloads.html
> > [2]
> https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html
> >
> > Best,
> > Danny Chan
> > 在 2019年12月14日 +0800 AM3:03，dev@flink.apache.org，写道：
> > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#dependencies
> >
>

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

Reply via email to