Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2020-02-11 Thread Stephan Ewen
IIRC, Guowei wants to work on supporting Table API connectors in Plugins. With that, we could have the Hive dependency as a plugin, avoiding dependency conflicts. On Thu, Feb 6, 2020 at 1:11 PM Jingsong Li wrote: > Hi Stephan, > > Good idea. Just like hadoop, we can have flink-shaded-hive-uber.

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2020-02-06 Thread Jingsong Li
Hi Stephan, Good idea. Just like hadoop, we can have flink-shaded-hive-uber. Then the startup of hive integration will be very simple with one or two pre-bundled, user just add these dependencies: - flink-connector-hive.jar - flink-shaded-hive-uber-.jar Some changes are needed, but I think it sho

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2020-02-06 Thread Stephan Ewen
Hi Jingsong! This sounds that with two pre-bundled versions (hive 1.2.1 and hive 2.3.6) you can cover a lot of versions. Would it make sense to add these to flink-shaded (with proper dependency exclusions of unnecessary dependencies) and offer them as a download, similar as we offer pre-shaded Ha

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2020-02-06 Thread Jingsong Li
Hi Stephan, The hive/lib/ has many jars, this lib is for execution, metastore, hive client and all things. What we really depend on is hive-exec.jar. (hive-metastore.jar is also required in the low version hive) And hive-exec.jar is a uber jar. We just want half classes of it. These half classes a

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2020-02-05 Thread Stephan Ewen
Some thoughts about other options we have: - Put fat/shaded jars for the common versions into "flink-shaded" and offer them for download on the website, similar to pre-bundles Hadoop versions. - Look at the Presto code (Metastore protocol) and see if we can reuse that - Have a setup helper

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2020-02-04 Thread Rui Li
Hi Stephan, As Jingsong stated, in our documentation the recommended way to add Hive deps is to use exactly what users have installed. It's just we ask users to manually add those jars, instead of automatically find them based on env variables. I prefer to keep it this way for a while, and see if

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2020-02-04 Thread Jingsong Li
Hi all, For your information, we have document the dependencies detailed information [1]. I think it's a lot clearer than before, but it's worse than presto and spark (they avoid or have built-in hive dependency). I thought about Stephan's suggestion: - The hive/lib has 200+ jars, but we only nee

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2020-02-03 Thread Stephan Ewen
We have had much trouble in the past from "too deep too custom" integrations that everyone got out of the box, i.e., Hadoop. Flink has has such a broad spectrum of use cases, if we have custom build for every other framework in that spectrum, we'll be in trouble. So I would also be -1 for custom b

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2019-12-18 Thread Till Rohrmann
Couldn't it simply be documented which jars are in the convenience jars which are pre built and can be downloaded from the website? Then people who need a custom version know which jars they need to provide to Flink? Cheers, Till On Tue, Dec 17, 2019 at 6:49 PM Bowen Li wrote: > I'm not sure pr

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2019-12-17 Thread Bowen Li
I'm not sure providing an uber jar would be possible. Different from kafka and elasticsearch connector who have dependencies for a specific kafka/elastic version, or the kafka universal connector that provides good compatibilities, hive connector needs to deal with hive jars in all 1.x, 2.x, 3.x v

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2019-12-15 Thread Danny Chan
Also -1 on separate builds. After referencing some other BigData engines for distribution[1], i didn't find strong needs to publish a separate build for just a separate Hive version, indeed there are builds for different Hadoop version. Just like Seth and Aljoscha said, we could push a flink-hi

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2019-12-15 Thread Jingsong Li
Thanks all for explaining. I misunderstood the original proposal. -1 to put them in our distributions +1 to have provide hive uber jars as Seth and Aljoscha advice Hive is just a connector no matter how important it is. So I totally agree that we shouldn't put them in our distributions. We can st

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2019-12-14 Thread Jark Wu
I agree with Seth and Aljoscha and think that is a right way to go. We already provided uber jars for kafka and elasticsearch for out-of-box, you can see the download links in this page[1]. Users can easily to download the connectors and versions they like and drag to SQL CLI lib directories. The u

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2019-12-13 Thread Aljoscha Krettek
I was going to suggest the same thing as Seth. So yes, I’m against having Flink distributions that contain Hive but for convenience downloads as we have for Hadoop. Best, Aljoscha > On 13. Dec 2019, at 18:04, Seth Wiesman wrote: > > I'm also -1 on separate builds. > > What about publishing c

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2019-12-13 Thread Seth Wiesman
I'm also -1 on separate builds. What about publishing convenience jars that contain the dependencies for each version? For example, there could be a flink-hive-1.2.1-uber.jar that users could just add to their lib folder that contains all the necessary dependencies to connect to that hive version.

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2019-12-13 Thread Robert Metzger
I'm generally not opposed to convenience binaries, if a huge number of people would benefit from them, and the overhead for the Flink project is low. I did not see a huge demand for such binaries yet (neither for the Flink + Hive integration). Looking at Apache Spark, they are also only offering co

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2019-12-13 Thread Chesnay Schepler
-1 We shouldn't need to deploy additional binaries to have a feature be remotely usable. This usually points to something else being done incorrectly. If it is indeed such a hassle to setup hive on Flink, then my conclusion would be that either a) the documentation needs to be improved b) th

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2019-12-13 Thread Jingsong Li
Hi Bowen, Thanks for driving this. +1 for this proposal. Due to our multi version support, users are required to rely on different dependencies, it does break the "out of box" experience. Now that the client has changed to go to child first class loader resolve by default, it puts forward higher

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2019-12-13 Thread Terry Wang
Hi Bowen~ Thanks for driving on this. I have tried using sql client with hive connector about two weeks ago, it’s painful to set up the environment from my experience. + 1 for this proposal. Best, Terry Wang > 2019年12月13日 16:44,Bowen Li 写道: > > Hi all, > > I want to propose to have a coupl

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2019-12-13 Thread Jeff Zhang
+1, this is definitely necessary for better user experience. Setting up environment is always painful for many big data tools. Bowen Li 于2019年12月13日周五 下午5:02写道: > cc user ML in case anyone want to chime in > > On Fri, Dec 13, 2019 at 00:44 Bowen Li wrote: > >> Hi all, >> >> I want to propose

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2019-12-13 Thread Bowen Li
cc user ML in case anyone want to chime in On Fri, Dec 13, 2019 at 00:44 Bowen Li wrote: > Hi all, > > I want to propose to have a couple separate Flink distributions with Hive > dependencies on specific Hive versions (2.3.4 and 1.2.1). The distributions > will be provided to users on Flink down

[DISCUSS] have separate Flink distributions with built-in Hive dependencies

2019-12-13 Thread Bowen Li
Hi all, I want to propose to have a couple separate Flink distributions with Hive dependencies on specific Hive versions (2.3.4 and 1.2.1). The distributions will be provided to users on Flink download page [1]. A few reasons to do this: 1) Flink-Hive integration is important to many many Flink