I was going to suggest the same thing as Seth. So yes, I’m against having Flink distributions that contain Hive but for convenience downloads as we have for Hadoop.
Best, Aljoscha > On 13. Dec 2019, at 18:04, Seth Wiesman <sjwies...@gmail.com> wrote: > > I'm also -1 on separate builds. > > What about publishing convenience jars that contain the dependencies for > each version? For example, there could be a flink-hive-1.2.1-uber.jar that > users could just add to their lib folder that contains all the necessary > dependencies to connect to that hive version. > > > On Fri, Dec 13, 2019 at 8:50 AM Robert Metzger <rmetz...@apache.org> wrote: > >> I'm generally not opposed to convenience binaries, if a huge number of >> people would benefit from them, and the overhead for the Flink project is >> low. I did not see a huge demand for such binaries yet (neither for the >> Flink + Hive integration). Looking at Apache Spark, they are also only >> offering convenience binaries for Hadoop only. >> >> Maybe we could provide a "Docker Playground" for Flink + Hive in the >> documentation (and the flink-playgrounds.git repo)? >> (similar to >> >> https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink-operations-playground.html >> ) >> >> >> >> On Fri, Dec 13, 2019 at 3:04 PM Chesnay Schepler <ches...@apache.org> >> wrote: >> >>> -1 >>> >>> We shouldn't need to deploy additional binaries to have a feature be >>> remotely usable. >>> This usually points to something else being done incorrectly. >>> >>> If it is indeed such a hassle to setup hive on Flink, then my conclusion >>> would be that either >>> a) the documentation needs to be improved >>> b) the architecture needs to be improved >>> or, if all else fails c) provide a utility script for setting it up >> easier. >>> >>> We spent a lot of time on reducing the number of binaries in the hadoop >>> days, and also go extra steps to prevent a separate Java 11 binary, and >>> I see no reason why Hive should get special treatment on this matter. >>> >>> Regards, >>> Chesnay >>> >>> On 13/12/2019 09:44, Bowen Li wrote: >>>> Hi all, >>>> >>>> I want to propose to have a couple separate Flink distributions with >> Hive >>>> dependencies on specific Hive versions (2.3.4 and 1.2.1). The >>> distributions >>>> will be provided to users on Flink download page [1]. >>>> >>>> A few reasons to do this: >>>> >>>> 1) Flink-Hive integration is important to many many Flink and Hive >> users >>> in >>>> two dimensions: >>>> a) for Flink metadata: HiveCatalog is the only persistent catalog >>> to >>>> manage Flink tables. With Flink 1.10 supporting more DDL, the >> persistent >>>> catalog would be playing even more critical role in users' workflow >>>> b) for Flink data: Hive data connector (source/sink) helps both >>> Flink >>>> and Hive users to unlock new use cases in streaming, >>> near-realtime/realtime >>>> data warehouse, backfill, etc. >>>> >>>> 2) currently users have to go thru a *really* tedious process to get >>>> started, because it requires lots of extra jars (see [2]) that are >> absent >>>> in Flink's lean distribution. We've had so many users from public >> mailing >>>> list, private email, DingTalk groups who got frustrated on spending >> lots >>> of >>>> time figuring out the jars themselves. They would rather have a more >>> "right >>>> out of box" quickstart experience, and play with the catalog and >>>> source/sink without hassle. >>>> >>>> 3) it's easier for users to replace those Hive dependencies for their >> own >>>> Hive versions - just replace those jars with the right versions and no >>> need >>>> to find the doc. >>>> >>>> * Hive 2.3.4 and 1.2.1 are two versions that represent lots of user >> base >>>> out there, and that's why we are using them as examples for >> dependencies >>> in >>>> [1] even though we've supported almost all Hive versions [3] now. >>>> >>>> I want to hear what the community think about this, and how to achieve >> it >>>> if we believe that's the way to go. >>>> >>>> Cheers, >>>> Bowen >>>> >>>> [1] https://flink.apache.org/downloads.html >>>> [2] >>>> >>> >> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#dependencies >>>> [3] >>>> >>> >> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#supported-hive-versions >>>> >>> >>> >>