Some thoughts about other options we have: - Put fat/shaded jars for the common versions into "flink-shaded" and offer them for download on the website, similar to pre-bundles Hadoop versions.
- Look at the Presto code (Metastore protocol) and see if we can reuse that - Have a setup helper script that takes the versions and pulls the required dependencies. Can you share how can a "built-in" dependency could work, if there are so many different conflicting versions? Thanks, Stephan On Tue, Feb 4, 2020 at 12:59 PM Rui Li <li...@apache.org> wrote: > Hi Stephan, > > As Jingsong stated, in our documentation the recommended way to add Hive > deps is to use exactly what users have installed. It's just we ask users to > manually add those jars, instead of automatically find them based on env > variables. I prefer to keep it this way for a while, and see if there're > real concerns/complaints from user feedbacks. > > Please also note the Hive jars are not the only ones needed to integrate > with Hive, users have to make sure flink-connector-hive and Hadoop jars are > in classpath too. So I'm afraid a single "HIVE" env variable wouldn't save > all the manual work for our users. > > On Tue, Feb 4, 2020 at 5:54 PM Jingsong Li <jingsongl...@gmail.com> wrote: > > > Hi all, > > > > For your information, we have document the dependencies detailed > > information [1]. I think it's a lot clearer than before, but it's worse > > than presto and spark (they avoid or have built-in hive dependency). > > > > I thought about Stephan's suggestion: > > - The hive/lib has 200+ jars, but we only need hive-exec.jar or plus two > > or three jars, if so many jars are introduced, maybe will there be a big > > conflict. > > - And hive/lib is not available on every machine. We need to upload so > > many jars. > > - A separate classloader maybe hard to work too, our flink-connector-hive > > need hive jars, we may need to deal with flink-connector-hive jar spacial > > too. > > CC: Rui Li > > > > I think the best system to integrate with hive is presto, which only > > connects hive metastore through thrift protocol. But I understand that it > > costs a lot to rewrite the code. > > > > [1] > > > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#dependencies > > > > Best, > > Jingsong Lee > > > > On Tue, Feb 4, 2020 at 1:44 AM Stephan Ewen <se...@apache.org> wrote: > > > >> We have had much trouble in the past from "too deep too custom" > >> integrations that everyone got out of the box, i.e., Hadoop. > >> Flink has has such a broad spectrum of use cases, if we have custom > build > >> for every other framework in that spectrum, we'll be in trouble. > >> > >> So I would also be -1 for custom builds. > >> > >> Couldn't we do something similar as we started doing for Hadoop? Moving > >> away from convenience downloads to allowing users to "export" their > setup > >> for Flink? > >> > >> - We can have a "hive module (loader)" in flink/lib by default > >> - The module loader would look for an environment variable like > >> "HIVE_CLASSPATH" and load these classes (ideally in a separate > >> classloader). > >> - The loader can search for certain classes and instantiate catalog / > >> functions / etc. when finding them instantiates the hive module > >> referencing > >> them > >> - That way, we use exactly what users have installed, without needing > to > >> build our own bundles. > >> > >> Could that work? > >> > >> Best, > >> Stephan > >> > >> > >> On Wed, Dec 18, 2019 at 9:43 AM Till Rohrmann <trohrm...@apache.org> > >> wrote: > >> > >> > Couldn't it simply be documented which jars are in the convenience > jars > >> > which are pre built and can be downloaded from the website? Then > people > >> who > >> > need a custom version know which jars they need to provide to Flink? > >> > > >> > Cheers, > >> > Till > >> > > >> > On Tue, Dec 17, 2019 at 6:49 PM Bowen Li <bowenl...@gmail.com> wrote: > >> > > >> > > I'm not sure providing an uber jar would be possible. > >> > > > >> > > Different from kafka and elasticsearch connector who have > dependencies > >> > for > >> > > a specific kafka/elastic version, or the kafka universal connector > >> that > >> > > provides good compatibilities, hive connector needs to deal with > hive > >> > jars > >> > > in all 1.x, 2.x, 3.x versions (let alone all the HDP/CDH > >> distributions) > >> > > with incompatibility even between minor versions, different > versioned > >> > > hadoop and other extra dependency jars for each hive version. > >> > > > >> > > Besides, users usually need to be able to easily see which > individual > >> > jars > >> > > are required, which is invisible from an uber jar. Hive users > already > >> > have > >> > > their hive deployments. They usually have to use their own hive jars > >> > > because, unlike hive jars on mvn, their own jars contain changes > >> in-house > >> > > or from vendors. They need to easily tell which jars Flink requires > >> for > >> > > corresponding open sourced hive version to their own hive > deployment, > >> and > >> > > copy in-hosue jars over from hive deployments as replacements. > >> > > > >> > > Providing a script to download all the individual jars for a > specified > >> > hive > >> > > version can be an alternative. > >> > > > >> > > The goal is we need to provide a *product*, not a technology, to > make > >> it > >> > > less hassle for Hive users. Afterall, it's Flink embracing Hive > >> community > >> > > and ecosystem, not the other way around. I'd argue Hive connector > can > >> be > >> > > treat differently because its community/ecosystem/userbase is much > >> larger > >> > > than the other connectors, and it's way more important than other > >> > > connectors to Flink on the mission of becoming a batch/streaming > >> unified > >> > > engine and get Flink more widely adopted. > >> > > > >> > > > >> > > On Sun, Dec 15, 2019 at 10:03 PM Danny Chan <yuzhao....@gmail.com> > >> > wrote: > >> > > > >> > > > Also -1 on separate builds. > >> > > > > >> > > > After referencing some other BigData engines for distribution[1], > i > >> > > didn't > >> > > > find strong needs to publish a separate build > >> > > > for just a separate Hive version, indeed there are builds for > >> different > >> > > > Hadoop version. > >> > > > > >> > > > Just like Seth and Aljoscha said, we could push a > >> > > > flink-hive-version-uber.jar to use as a lib of SQL-CLI or other > use > >> > > cases. > >> > > > > >> > > > [1] https://spark.apache.org/downloads.html > >> > > > [2] > >> > > > >> https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html > >> > > > > >> > > > Best, > >> > > > Danny Chan > >> > > > 在 2019年12月14日 +0800 AM3:03,dev@flink.apache.org,写道: > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#dependencies > >> > > > > >> > > > >> > > >> > > > > > > -- > > Best, Jingsong Lee > > >