I was going to suggest the same thing as Seth. So yes, I’m against having Flink 
distributions that contain Hive but for convenience downloads as we have for 
Hadoop.
Best,
Aljoscha

> On 13. Dec 2019, at 18:04, Seth Wiesman <sjwies...@gmail.com> wrote:
> 
> I'm also -1 on separate builds.
> 
> What about publishing convenience jars that contain the dependencies for
> each version? For example, there could be a flink-hive-1.2.1-uber.jar that
> users could just add to their lib folder that contains all the necessary
> dependencies to connect to that hive version.
> 
> 
> On Fri, Dec 13, 2019 at 8:50 AM Robert Metzger <rmetz...@apache.org> wrote:
> 
>> I'm generally not opposed to convenience binaries, if a huge number of
>> people would benefit from them, and the overhead for the Flink project is
>> low. I did not see a huge demand for such binaries yet (neither for the
>> Flink + Hive integration). Looking at Apache Spark, they are also only
>> offering convenience binaries for Hadoop only.
>> 
>> Maybe we could provide a "Docker Playground" for Flink + Hive in the
>> documentation (and the flink-playgrounds.git repo)?
>> (similar to
>> 
>> https://ci.apache.org/projects/flink/flink-docs-master/getting-started/docker-playgrounds/flink-operations-playground.html
>> )
>> 
>> 
>> 
>> On Fri, Dec 13, 2019 at 3:04 PM Chesnay Schepler <ches...@apache.org>
>> wrote:
>> 
>>> -1
>>> 
>>> We shouldn't need to deploy additional binaries to have a feature be
>>> remotely usable.
>>> This usually points to something else being done incorrectly.
>>> 
>>> If it is indeed such a hassle to setup hive on Flink, then my conclusion
>>> would be that either
>>> a) the documentation needs to be improved
>>> b) the architecture needs to be improved
>>> or, if all else fails c) provide a utility script for setting it up
>> easier.
>>> 
>>> We spent a lot of time on reducing the number of binaries in the hadoop
>>> days, and also go extra steps to prevent a separate Java 11 binary, and
>>> I see no reason why Hive should get special treatment on this matter.
>>> 
>>> Regards,
>>> Chesnay
>>> 
>>> On 13/12/2019 09:44, Bowen Li wrote:
>>>> Hi all,
>>>> 
>>>> I want to propose to have a couple separate Flink distributions with
>> Hive
>>>> dependencies on specific Hive versions (2.3.4 and 1.2.1). The
>>> distributions
>>>> will be provided to users on Flink download page [1].
>>>> 
>>>> A few reasons to do this:
>>>> 
>>>> 1) Flink-Hive integration is important to many many Flink and Hive
>> users
>>> in
>>>> two dimensions:
>>>>      a) for Flink metadata: HiveCatalog is the only persistent catalog
>>> to
>>>> manage Flink tables. With Flink 1.10 supporting more DDL, the
>> persistent
>>>> catalog would be playing even more critical role in users' workflow
>>>>      b) for Flink data: Hive data connector (source/sink) helps both
>>> Flink
>>>> and Hive users to unlock new use cases in streaming,
>>> near-realtime/realtime
>>>> data warehouse, backfill, etc.
>>>> 
>>>> 2) currently users have to go thru a *really* tedious process to get
>>>> started, because it requires lots of extra jars (see [2]) that are
>> absent
>>>> in Flink's lean distribution. We've had so many users from public
>> mailing
>>>> list, private email, DingTalk groups who got frustrated on spending
>> lots
>>> of
>>>> time figuring out the jars themselves. They would rather have a more
>>> "right
>>>> out of box" quickstart experience, and play with the catalog and
>>>> source/sink without hassle.
>>>> 
>>>> 3) it's easier for users to replace those Hive dependencies for their
>> own
>>>> Hive versions - just replace those jars with the right versions and no
>>> need
>>>> to find the doc.
>>>> 
>>>> * Hive 2.3.4 and 1.2.1 are two versions that represent lots of user
>> base
>>>> out there, and that's why we are using them as examples for
>> dependencies
>>> in
>>>> [1] even though we've supported almost all Hive versions [3] now.
>>>> 
>>>> I want to hear what the community think about this, and how to achieve
>> it
>>>> if we believe that's the way to go.
>>>> 
>>>> Cheers,
>>>> Bowen
>>>> 
>>>> [1] https://flink.apache.org/downloads.html
>>>> [2]
>>>> 
>>> 
>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#dependencies
>>>> [3]
>>>> 
>>> 
>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#supported-hive-versions
>>>> 
>>> 
>>> 
>> 

Reply via email to