We have had much trouble in the past from "too deep too custom" integrations that everyone got out of the box, i.e., Hadoop. Flink has has such a broad spectrum of use cases, if we have custom build for every other framework in that spectrum, we'll be in trouble.
So I would also be -1 for custom builds. Couldn't we do something similar as we started doing for Hadoop? Moving away from convenience downloads to allowing users to "export" their setup for Flink? - We can have a "hive module (loader)" in flink/lib by default - The module loader would look for an environment variable like "HIVE_CLASSPATH" and load these classes (ideally in a separate classloader). - The loader can search for certain classes and instantiate catalog / functions / etc. when finding them instantiates the hive module referencing them - That way, we use exactly what users have installed, without needing to build our own bundles. Could that work? Best, Stephan On Wed, Dec 18, 2019 at 9:43 AM Till Rohrmann <trohrm...@apache.org> wrote: > Couldn't it simply be documented which jars are in the convenience jars > which are pre built and can be downloaded from the website? Then people who > need a custom version know which jars they need to provide to Flink? > > Cheers, > Till > > On Tue, Dec 17, 2019 at 6:49 PM Bowen Li <bowenl...@gmail.com> wrote: > > > I'm not sure providing an uber jar would be possible. > > > > Different from kafka and elasticsearch connector who have dependencies > for > > a specific kafka/elastic version, or the kafka universal connector that > > provides good compatibilities, hive connector needs to deal with hive > jars > > in all 1.x, 2.x, 3.x versions (let alone all the HDP/CDH distributions) > > with incompatibility even between minor versions, different versioned > > hadoop and other extra dependency jars for each hive version. > > > > Besides, users usually need to be able to easily see which individual > jars > > are required, which is invisible from an uber jar. Hive users already > have > > their hive deployments. They usually have to use their own hive jars > > because, unlike hive jars on mvn, their own jars contain changes in-house > > or from vendors. They need to easily tell which jars Flink requires for > > corresponding open sourced hive version to their own hive deployment, and > > copy in-hosue jars over from hive deployments as replacements. > > > > Providing a script to download all the individual jars for a specified > hive > > version can be an alternative. > > > > The goal is we need to provide a *product*, not a technology, to make it > > less hassle for Hive users. Afterall, it's Flink embracing Hive community > > and ecosystem, not the other way around. I'd argue Hive connector can be > > treat differently because its community/ecosystem/userbase is much larger > > than the other connectors, and it's way more important than other > > connectors to Flink on the mission of becoming a batch/streaming unified > > engine and get Flink more widely adopted. > > > > > > On Sun, Dec 15, 2019 at 10:03 PM Danny Chan <yuzhao....@gmail.com> > wrote: > > > > > Also -1 on separate builds. > > > > > > After referencing some other BigData engines for distribution[1], i > > didn't > > > find strong needs to publish a separate build > > > for just a separate Hive version, indeed there are builds for different > > > Hadoop version. > > > > > > Just like Seth and Aljoscha said, we could push a > > > flink-hive-version-uber.jar to use as a lib of SQL-CLI or other use > > cases. > > > > > > [1] https://spark.apache.org/downloads.html > > > [2] > > https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html > > > > > > Best, > > > Danny Chan > > > 在 2019年12月14日 +0800 AM3:03,dev@flink.apache.org,写道: > > > > > > > > > > > > > > https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#dependencies > > > > > >