Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

Seth Wiesman Fri, 13 Dec 2019 09:05:16 -0800

I'm also -1 on separate builds.

What about publishing convenience jars that contain the dependencies for
each version? For example, there could be a flink-hive-1.2.1-uber.jar that
users could just add to their lib folder that contains all the necessary
dependencies to connect to that hive version.



On Fri, Dec 13, 2019 at 8:50 AM Robert Metzger <rmetz...@apache.org> wrote:

> I'm generally not opposed to convenience binaries, if a huge number of
> people would benefit from them, and the overhead for the Flink project is
> low. I did not see a huge demand for such binaries yet (neither for the
> Flink + Hive integration). Looking at Apache Spark, they are also only
> offering convenience binaries for Hadoop only.
>
> Maybe we could provide a "Docker Playground" for Flink + Hive in the
> documentation (and the flink-playgrounds.git repo)?
> (similar to
>
> https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink-operations-playground.html
> )
>
>
>
> On Fri, Dec 13, 2019 at 3:04 PM Chesnay Schepler <ches...@apache.org>
> wrote:
>
> > -1
> >
> > We shouldn't need to deploy additional binaries to have a feature be
> > remotely usable.
> > This usually points to something else being done incorrectly.
> >
> > If it is indeed such a hassle to setup hive on Flink, then my conclusion
> > would be that either
> > a) the documentation needs to be improved
> > b) the architecture needs to be improved
> > or, if all else fails c) provide a utility script for setting it up
> easier.
> >
> > We spent a lot of time on reducing the number of binaries in the hadoop
> > days, and also go extra steps to prevent a separate Java 11 binary, and
> > I see no reason why Hive should get special treatment on this matter.
> >
> > Regards,
> > Chesnay
> >
> > On 13/12/2019 09:44, Bowen Li wrote:
> > > Hi all,
> > >
> > > I want to propose to have a couple separate Flink distributions with
> Hive
> > > dependencies on specific Hive versions (2.3.4 and 1.2.1). The
> > distributions
> > > will be provided to users on Flink download page [1].
> > >
> > > A few reasons to do this:
> > >
> > > 1) Flink-Hive integration is important to many many Flink and Hive
> users
> > in
> > > two dimensions:
> > >       a) for Flink metadata: HiveCatalog is the only persistent catalog
> > to
> > > manage Flink tables. With Flink 1.10 supporting more DDL, the
> persistent
> > > catalog would be playing even more critical role in users' workflow
> > >       b) for Flink data: Hive data connector (source/sink) helps both
> > Flink
> > > and Hive users to unlock new use cases in streaming,
> > near-realtime/realtime
> > > data warehouse, backfill, etc.
> > >
> > > 2) currently users have to go thru a *really* tedious process to get
> > > started, because it requires lots of extra jars (see [2]) that are
> absent
> > > in Flink's lean distribution. We've had so many users from public
> mailing
> > > list, private email, DingTalk groups who got frustrated on spending
> lots
> > of
> > > time figuring out the jars themselves. They would rather have a more
> > "right
> > > out of box" quickstart experience, and play with the catalog and
> > > source/sink without hassle.
> > >
> > > 3) it's easier for users to replace those Hive dependencies for their
> own
> > > Hive versions - just replace those jars with the right versions and no
> > need
> > > to find the doc.
> > >
> > > * Hive 2.3.4 and 1.2.1 are two versions that represent lots of user
> base
> > > out there, and that's why we are using them as examples for
> dependencies
> > in
> > > [1] even though we've supported almost all Hive versions [3] now.
> > >
> > > I want to hear what the community think about this, and how to achieve
> it
> > > if we believe that's the way to go.
> > >
> > > Cheers,
> > > Bowen
> > >
> > > [1] https://flink.apache.org/downloads.html
> > > [2]
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#dependencies
> > > [3]
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#supported-hive-versions
> > >
> >
> >
>

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

Reply via email to