Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

Stephan Ewen Mon, 03 Feb 2020 09:45:34 -0800

We have had much trouble in the past from "too deep too custom"
integrations that everyone got out of the box, i.e., Hadoop.
Flink has has such a broad spectrum of use cases, if we have custom build
for every other framework in that spectrum, we'll be in trouble.


So I would also be -1 for custom builds.

Couldn't we do something similar as we started doing for Hadoop? Moving
away from convenience downloads to allowing users to "export" their setup
for Flink?

  - We can have a "hive module (loader)" in flink/lib by default
  - The module loader would look for an environment variable like
"HIVE_CLASSPATH" and load these classes (ideally in a separate classloader).
  - The loader can search for certain classes and instantiate catalog /
functions / etc. when finding them instantiates the hive module referencing
them
  - That way, we use exactly what users have installed, without needing to
build our own bundles.

Could that work?

Best,
Stephan


On Wed, Dec 18, 2019 at 9:43 AM Till Rohrmann <[email protected]> wrote:

> Couldn't it simply be documented which jars are in the convenience jars
> which are pre built and can be downloaded from the website? Then people who
> need a custom version know which jars they need to provide to Flink?
>
> Cheers,
> Till
>
> On Tue, Dec 17, 2019 at 6:49 PM Bowen Li <[email protected]> wrote:
>
> > I'm not sure providing an uber jar would be possible.
> >
> > Different from kafka and elasticsearch connector who have dependencies
> for
> > a specific kafka/elastic version, or the kafka universal connector that
> > provides good compatibilities, hive connector needs to deal with hive
> jars
> > in all 1.x, 2.x, 3.x versions (let alone all the HDP/CDH distributions)
> > with incompatibility even between minor versions, different versioned
> > hadoop and other extra dependency jars for each hive version.
> >
> > Besides, users usually need to be able to easily see which individual
> jars
> > are required, which is invisible from an uber jar. Hive users already
> have
> > their hive deployments. They usually have to use their own hive jars
> > because, unlike hive jars on mvn, their own jars contain changes in-house
> > or from vendors. They need to easily tell which jars Flink requires for
> > corresponding open sourced hive version to their own hive deployment, and
> > copy in-hosue jars over from hive deployments as replacements.
> >
> > Providing a script to download all the individual jars for a specified
> hive
> > version can be an alternative.
> >
> > The goal is we need to provide a *product*, not a technology, to make it
> > less hassle for Hive users. Afterall, it's Flink embracing Hive community
> > and ecosystem, not the other way around. I'd argue Hive connector can be
> > treat differently because its community/ecosystem/userbase is much larger
> > than the other connectors, and it's way more important than other
> > connectors to Flink on the mission of becoming a batch/streaming unified
> > engine and get Flink more widely adopted.
> >
> >
> > On Sun, Dec 15, 2019 at 10:03 PM Danny Chan <[email protected]>
> wrote:
> >
> > > Also -1 on separate builds.
> > >
> > > After referencing some other BigData engines for distribution[1], i
> > didn't
> > > find strong needs to publish a separate build
> > > for just a separate Hive version, indeed there are builds for different
> > > Hadoop version.
> > >
> > > Just like Seth and Aljoscha said, we could push a
> > > flink-hive-version-uber.jar to use as a lib of SQL-CLI or other use
> > cases.
> > >
> > > [1] https://spark.apache.org/downloads.html
> > > [2]
> > https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年12月14日 +0800 AM3:03，[email protected]，写道：
> > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#dependencies
> > >
> >
>

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

Reply via email to