Re: [PROGRESS UPDATE] [DISCUSS] Flink-Hive Integration and Catalogs

Shaoxuan Wang Tue, 19 Mar 2019 20:45:23 -0700

Hi Bowen,
Thanks for driving this. I am CCing this email/survey to user-zh@
flink.apache.org as well.
I heard there are lots of interests on Flink-Hive from the field. One of
the biggest requests the hive users are raised is "the support of
out-of-date hive version". A large amount of users are still working on the
cluster with CDH/HDP installed with old hive version, say 1.2.1/2.1.1. We
need ensure the support of these Hive version when planning the work on
Flink-Hive integration.


*@all. "We want to get your feedbacks on Flink-Hive integration." *

Regards,
Shaoxuan

On Wed, Mar 20, 2019 at 7:16 AM Bowen Li <bowenl...@gmail.com> wrote:

> Hi Flink users and devs,
>
> We want to get your feedbacks on integrating Flink with Hive.
>
> Background: In Flink Forward in Beijing last December, the community
> announced to initiate efforts on integrating Flink and Hive. On Feb 21 Seattle
> Flink Meetup <https://www.meetup.com/seattle-flink/events/258723322/>, We
> presented Integrating Flink with Hive
> <https://www.slideshare.net/BowenLi9/integrating-flink-with-hive-xuefu-zhang-and-bowen-li-seattle-flink-meetup-feb-2019>
>  with
> a live demo to local community and got great response. As of mid March now,
> we have internally finished building Flink's brand-new catalog
> infrastructure, metadata integration with Hive, and most common cases of
> Flink reading/writing against Hive, and will start to submit more design
> docs/FLIP and contribute code back to community. The reason for doing it
> internally first and then in community is to ensure our proposed solutions
> are fully validated and tested, gain hands-on experience and not miss
> anything in design. You are very welcome to join this effort, from
> design/code review, to development and testing.
>
> *The most important thing we believe you, our Flink users/devs, can help
> RIGHT NOW is to share your Hive use cases and give us feedbacks for this
> project. As we start to go deeper on specific areas of integration, you
> feedbacks and suggestions will help us to refine our backlogs and
> prioritize our work, and you can get the features you want sooner! *Just
> for example, if most users is mainly only reading Hive data, then we can
> prioritize tuning read performance over implementing write capability.
> A quick review of what we've finished building internally and is ready to
> contribute back to community:
>
>    - Flink/Hive Metadata Integration
>       - Unified, pluggable catalog infra that manages meta-objects,
>       including catalogs, databases, tables, views, functions, partitions,
>       table/partition stats
>       - Three catalog impls - A in-memory catalog, HiveCatalog for
>       embracing Hive ecosystem, GenericHiveMetastoreCatalog for persisting
>       Flink's streaming/batch metadata in Hive metastore
>       - Hierarchical metadata reference as
>       <catalog_name>.<database_name>.<metaobject_name> in SQL and Table API
>       - Unified function catalog based on new catalog infra, also support
>       Hive simple UDF
>    - Flink/Hive Data Integration
>       - Hive data connector that reads partitioned/non-partitioned Hive
>       tables, and supports partition pruning, both Hive simple and complex 
> data
>       types, and basic write
>    - More powerful SQL Client fully integrated with the above features
>    and more Hive-compatible SQL syntax for better end-to-end SQL experience
>
> *Given above info, we want to learn from you on: How do you use Hive
> currently? How can we solve your pain points? What features do you expect
> from Flink-Hive integration? Those can be details like:*
>
>    - *Which Hive version are you using? Do you plan to upgrade Hive?*
>    - *Are you planning to switch Hive engine? What timeline are you
>    looking at? Until what capabilities Flink has will you consider using Flink
>    with Hive?*
>    - *What's your motivation to try Flink-Hive? Maintain only one data
>    processing system across your teams for simplicity and maintainability?
>    Better performance of Flink over Hive itself?*
>    - *What are your Hive use cases? How large is your Hive data size? Do
>    you mainly do reading, or both reading and writing?*
>    - *How many Hive user defined functions do you have? Are they mostly
>    UDF, GenericUDF, or UDTF, or UDAF?*
>    - any questions or suggestions you have? or as simple as how you feel
>    about the project
>
> Again, your input will be really valuable to us, and we hope, with all of
> us working together, the project can benefits our end users. Please feel
> free to either reply to this thread or just to me. I'm also working on
> creating a questionnaire to better gather your feedbacks, watch for the
> maillist in the next couple days.
>
> Thanks,
> Bowen
>
>
>
>
>

Re: [PROGRESS UPDATE] [DISCUSS] Flink-Hive Integration and Catalogs

Reply via email to