Hi Bowen, Thanks for driving this. I am CCing this email/survey to user-zh@ flink.apache.org as well. I heard there are lots of interests on Flink-Hive from the field. One of the biggest requests the hive users are raised is "the support of out-of-date hive version". A large amount of users are still working on the cluster with CDH/HDP installed with old hive version, say 1.2.1/2.1.1. We need ensure the support of these Hive version when planning the work on Flink-Hive integration.
*@all. "We want to get your feedbacks on Flink-Hive integration." * Regards, Shaoxuan On Wed, Mar 20, 2019 at 7:16 AM Bowen Li <bowenl...@gmail.com> wrote: > Hi Flink users and devs, > > We want to get your feedbacks on integrating Flink with Hive. > > Background: In Flink Forward in Beijing last December, the community > announced to initiate efforts on integrating Flink and Hive. On Feb 21 Seattle > Flink Meetup <https://www.meetup.com/seattle-flink/events/258723322/>, We > presented Integrating Flink with Hive > <https://www.slideshare.net/BowenLi9/integrating-flink-with-hive-xuefu-zhang-and-bowen-li-seattle-flink-meetup-feb-2019> > with > a live demo to local community and got great response. As of mid March now, > we have internally finished building Flink's brand-new catalog > infrastructure, metadata integration with Hive, and most common cases of > Flink reading/writing against Hive, and will start to submit more design > docs/FLIP and contribute code back to community. The reason for doing it > internally first and then in community is to ensure our proposed solutions > are fully validated and tested, gain hands-on experience and not miss > anything in design. You are very welcome to join this effort, from > design/code review, to development and testing. > > *The most important thing we believe you, our Flink users/devs, can help > RIGHT NOW is to share your Hive use cases and give us feedbacks for this > project. As we start to go deeper on specific areas of integration, you > feedbacks and suggestions will help us to refine our backlogs and > prioritize our work, and you can get the features you want sooner! *Just > for example, if most users is mainly only reading Hive data, then we can > prioritize tuning read performance over implementing write capability. > A quick review of what we've finished building internally and is ready to > contribute back to community: > > - Flink/Hive Metadata Integration > - Unified, pluggable catalog infra that manages meta-objects, > including catalogs, databases, tables, views, functions, partitions, > table/partition stats > - Three catalog impls - A in-memory catalog, HiveCatalog for > embracing Hive ecosystem, GenericHiveMetastoreCatalog for persisting > Flink's streaming/batch metadata in Hive metastore > - Hierarchical metadata reference as > <catalog_name>.<database_name>.<metaobject_name> in SQL and Table API > - Unified function catalog based on new catalog infra, also support > Hive simple UDF > - Flink/Hive Data Integration > - Hive data connector that reads partitioned/non-partitioned Hive > tables, and supports partition pruning, both Hive simple and complex > data > types, and basic write > - More powerful SQL Client fully integrated with the above features > and more Hive-compatible SQL syntax for better end-to-end SQL experience > > *Given above info, we want to learn from you on: How do you use Hive > currently? How can we solve your pain points? What features do you expect > from Flink-Hive integration? Those can be details like:* > > - *Which Hive version are you using? Do you plan to upgrade Hive?* > - *Are you planning to switch Hive engine? What timeline are you > looking at? Until what capabilities Flink has will you consider using Flink > with Hive?* > - *What's your motivation to try Flink-Hive? Maintain only one data > processing system across your teams for simplicity and maintainability? > Better performance of Flink over Hive itself?* > - *What are your Hive use cases? How large is your Hive data size? Do > you mainly do reading, or both reading and writing?* > - *How many Hive user defined functions do you have? Are they mostly > UDF, GenericUDF, or UDTF, or UDAF?* > - any questions or suggestions you have? or as simple as how you feel > about the project > > Again, your input will be really valuable to us, and we hope, with all of > us working together, the project can benefits our end users. Please feel > free to either reply to this thread or just to me. I'm also working on > creating a questionnaire to better gather your feedbacks, watch for the > maillist in the next couple days. > > Thanks, > Bowen > > > > >