Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2019-01-09 Thread Timo Walther
Hi Bowen, thanks for your feedback. We should not change the Google doc anymore but apply additional comments in the wiki page. I will also add a bit more explanation to some parts so that people know about certain design decisions. Regards, Timo Am 08.01.19 um 22:54 schrieb Bowen Li: Tha

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2019-01-08 Thread Bowen Li
Thank you, Xuefu and Timo, for putting together the FLIP! I like that both its scope and implementation plan are clear. Look forward to feedbacks from the group. I also added a few more complementary details in the doc. Thanks, Bowen On Mon, Jan 7, 2019 at 8:37 PM Zhang, Xuefu wrote: > Thanks

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2019-01-07 Thread Zhang, Xuefu
Thanks, Timo! I have started put the content from the google doc to FLIP-30 [1]. However, please still keep the discussion along this thread. Thanks, Xuefu [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs --

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2019-01-07 Thread Timo Walther
Hi everyone, Xuefu and I had multiple iterations over the catalog design document [1]. I believe that it is in a good shape now to be converted into FLIP. Maybe we need a bit more explanation at some places but the general design would be ready now. The design document covers the following c

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2019-01-07 Thread Timo Walther
Hi Eron, thank you very much for the contributions. I merged the first little bug fixes. For the remaining PRs I think we can review and merge them soon. As you said, the code is agnostic to the details of the ExternalCatalog interface and I don't expect bigger merge conflicts in the near futu

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2019-01-06 Thread Eron Wright
Thanks Timo for merging a couple of the PRs. Are you also able to review the others that I mentioned? Xuefu I would like to incorporate your feedback too. Check out this short demonstration of using a catalog in SQL Client: https://asciinema.org/a/C8xuAjmZSxCuApgFgZQyeIHuo Thanks again! On Th

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2019-01-03 Thread Eron Wright
Would a couple folks raise their hand to make a review pass thru the 6 PRs listed above? It is a lovely stack of PRs that is 'all green' at the moment. I would be happy to open follow-on PRs to rapidly align with other efforts. Note that the code is agnostic to the details of the ExternalCatalo

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2019-01-02 Thread Eron Wright
I propose that the community review and merge the PRs that I posted, and then evolve the design thru 1.8 and beyond. I think having a basic infrastructure in place now will accelerate the effort, do you agree? Thanks again! On Wed, Jan 2, 2019 at 11:20 AM Zhang, Xuefu wrote: > Hi Eron, > > Ha

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2019-01-02 Thread Zhang, Xuefu
Hi Eron, Happy New Year! Thank you very much for your contribution, especially during the holidays. Wile I'm encouraged by your work, I'd also like to share my thoughts on how to move forward. First, please note that the design discussion is still finalizing, and we expect some moderate chang

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2019-01-01 Thread Eron Wright
Hi folks, there's clearly some incremental steps to be taken to introduce catalog support to SQL Client, complementary to what is proposed in the Flink-Hive Metastore design doc. I was quietly working on this over the holidays. I posted some new sub-tasks, PRs, and sample code to FLINK-10744. W

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-11-18 Thread Zhang, Xuefu
Hi Xiaowei, Thanks for bringing up the question. In the current design, the properties for meta objects are meant to cover anything that's specific to a particular catalog and agnostic to Flink. Anything that is common (such as schema for tables, query text for views, and udf classname) are abs

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-11-17 Thread Xiaowei Jiang
Thanks Xuefu for the detailed design doc! One question on the properties associated with the catalog objects. Are we going to leave them completely free form or we are going to set some standard for that? I think that the answer may depend on if we want to explore catalog specific optimization oppo

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-11-15 Thread Bowen Li
Thanks for keeping on improving the overall design, Xuefu! It looks quite good to me now. Would be nice that cc-ed Flink committers can help to review and confirm! One minor suggestion: Since the last section of design doc already touches some new sql statements, shall we add another section in

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-11-14 Thread Zhang, Xuefu
Thanks, Bowen, for catching the error. I have granted comment permission with the link. I also updated the doc with the latest class definitions. Everyone is encouraged to review and comment. Thanks, Xuefu -- Sender:Bowen Li Sen

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-11-13 Thread Bowen Li
Hi Xuefu, Currently the new design doc is on “view only" mode, and people cannot leave comments. Can you please change it to "can comment" or "can edit" mode? Thanks, Bowen On Mon, Nov 12, 2018 at 9:51 PM Zha

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-11-12 Thread Zhang, Xuefu
Hi Piotr I have extracted the API portion of the design and the google doc is here. Please review and provide your feedback. Thanks, Xuefu -- Sender:Xuefu Sent at:2018 Nov 12 (Mon) 12:43 Recipient:Piotr Nowojski ; dev Cc:Bowen

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-11-11 Thread Zhang, Xuefu
Hi Piotr, That sounds good to me. Let's close all the open questions ((there are a couple of them)) in the Google doc and I should be able to quickly split it into the three proposals as you suggested. Thanks, Xuefu -- Sender:Pio

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-11-09 Thread Piotr Nowojski
Hi, Yes, it seems like the best solution. Maybe someone else can also suggests if we can split it further? Maybe changes in the interface in one doc, reading from hive meta store another and final storing our meta informations in hive meta store? Piotrek > On 9 Nov 2018, at 01:44, Zhang, Xuef

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-11-08 Thread Zhang, Xuefu
Hi Piotr, That seems to be good idea! Since the google doc for the design is currently under extensive review, I will leave it as it is for now. However, I'll convert it to two different FLIPs when the time comes. How does it sound to you? Thanks, Xuefu -

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-11-08 Thread Piotr Nowojski
Hi, Maybe we should split this topic (and the design doc) into couple of smaller ones, hopefully independent. The questions that you have asked Fabian have for example very little to do with reading metadata from Hive Meta Store? Piotrek > On 7 Nov 2018, at 14:27, Fabian Hueske wrote: > > H

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-11-07 Thread Fabian Hueske
Hi Xuefu and all, Thanks for sharing this design document! I'm very much in favor of restructuring / reworking the catalog handling in Flink SQL as outlined in the document. Most changes described in the design document seem to be rather general and not specifically related to the Hive integration

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-11-01 Thread Bowen Li
After taking a look at how other discussion threads work, I think it's actually fine just keep our discussion here. It's up to you, Xuefu. The google doc LGTM. I left some minor comments. On Thu, Nov 1, 2018 at 10:17 AM Bowen Li wrote: > Hi all, > > As Xuefu has published the design doc on goog

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-11-01 Thread Bowen Li
Hi all, As Xuefu has published the design doc on google, I agree with Shuyi's suggestion that we probably should start a new email thread like "[DISCUSS] ... Hive integration design ..." on only dev mailing list for community devs to review. The current thread sends to both dev and user list. Thi

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-31 Thread Zhang, Xuefu
Hi Shuiyi, Good idea. Actually the PDF was converted from a google doc. Here is its link: https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing Once we reach an agreement, I can convert it to a FLIP. Thanks, Xuefu

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-31 Thread Shuyi Chen
Hi Xuefu, Thanks a lot for driving this big effort. I would suggest convert your proposal and design doc into a google doc, and share it on the dev mailing list for the community to review and comment with title like "[DISCUSS] ... Hive integration design ..." . Once approved, we can document it

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-30 Thread Zhang, Xuefu
Hi all, I have also shared a design doc on Hive metastore integration that is attached here and also to FLINK-10556[1]. Please kindly review and share your feedback. Thanks, Xuefu [1] https://issues.apache.org/jira/browse/FLINK-10556

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-24 Thread Zhang, Xuefu
Hi all, To wrap up the discussion, I have attached a PDF describing the proposal, which is also attached to FLINK-10556 [1]. Please feel free to watch that JIRA to track the progress. Please also let me know if you have additional comments or questions. Thanks, Xuefu [1] https://issues.apache

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-15 Thread Zhang, Xuefu
Hi Shuyi, Thank you for your input. Yes, I agreed with a phased approach and like to move forward fast. :) We did some work internally on DDL utilizing babel parser in Calcite. While babel makes Calcite's grammar extensible, at first impression it still seems too cumbersome for a project when t

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-15 Thread Zhang, Xuefu
Hi Bowen, Thank you for your feedback and interest in the project. Your contribution is certainly welcome. Per your suggestion, I have created an Uber JIRA (https://issues.apache.org/jira/browse/FLINK-10556) to track our overall effort on this. For each subtask, we'd like to see a short descrip

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-13 Thread Shuyi Chen
Welcome to the community and thanks for the great proposal, Xuefu! I think the proposal can be divided into 2 stages: making Flink to support Hive features, and make Hive to work with Flink. I agreed with Timo that on starting with a smaller scope, so we can make progress faster. As for [6], a prop

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-13 Thread Bowen
Thank you Xuefu, for bringing up this awesome, detailed proposal! It will resolve lots of existing pain for users like me. In general, I totally agree that improving FlinkSQL's completeness would be a much better start point than building 'Hive on Flink', as the Hive community is concerned abou

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-12 Thread Jörn Franke
Thank you very nice , I fully agree with that. > Am 11.10.2018 um 19:31 schrieb Zhang, Xuefu : > > Hi Jörn, > > Thanks for your feedback. Yes, I think Hive on Flink makes sense and in fact > it is one of the two approaches that I named in the beginning of the thread. > As also pointed out the

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-12 Thread Taher Koitawala
Sounds smashing; I think the initial integration will help 60% or so flink sql users and a lot other use cases will emerge when we solve the first one. Thanks, Taher Koitawala On Fri 12 Oct, 2018, 10:13 AM Zhang, Xuefu, wrote: > Hi Taher, > > Thank you for your input. I think you emphasized

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Zhang, Xuefu
Hi Taher, Thank you for your input. I think you emphasized two important points: 1. Hive metastore could be used for storing Flink metadata 2. There are some usability issues around Flink SQL configuration I think we all agree on #1. #2 may be well true and the usability should be improved. How

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Taher Koitawala
One other thought on the same lines was to use hive tables to store kafka information to process streaming tables. Something like "create table streaming_table ( bootstrapServers string, topic string, keySerialiser string, ValueSerialiser string)" Insert into streaming_table values(,"10.17.1.1:90

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Taher Koitawala
I think integrating Flink with Hive would be an amazing option and also to get Flink's SQL up to pace would be amazing. Current Flink Sql syntax to prepare and process a table is too verbose, users manually need to retype table definitions and that's a pain. Hive metastore integration should be do

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Zhang, Xuefu
Hi Rong, Thanks for your feedback. Some of my earlier comments might have addressed some of your points, so here I'd like to cover some specifics. 1. Yes, I expect that table stats stored in Hive will be used in Flink plan optimization, but it's not part of compatibility concern (yet). 2. Both

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Zhang, Xuefu
Hi Timo, Thank you for your input. It's exciting to see that the community has already initiated some of the topics. We'd certainly like to leverage the current and previous work and make progress in phases. Here I'd like to comment on a few things on top of your feedback. 1. I think there are

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Rong Rong
Hi Xuefu, Thanks for putting together the overview. I would like to add some more on top of Timo's comments. 1,2. I agree with Timo that a proper catalog support should also address the metadata compatibility issues. I was actually wondering if you are referring to something like utilizing table s

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Zhang, Xuefu
Hi Jörn, Thanks for your feedback. Yes, I think Hive on Flink makes sense and in fact it is one of the two approaches that I named in the beginning of the thread. As also pointed out there, this isn't mutually exclusive from work we proposed inside Flink and they target at different user groups

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-11 Thread Timo Walther
Hi Xuefu, thanks for your proposal, it is a nice summary. Here are my thoughts to your list: 1. I think this is also on our current mid-term roadmap. Flink lacks a poper catalog support for a very long time. Before we can connect catalogs we need to define how to map all the information from

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-10 Thread Jörn Franke
Would it maybe make sense to provide Flink as an engine on Hive („flink-on-Hive“)? Eg to address 4,5,6,8,9,10. this could be more loosely coupled than integrating hive in all possible flink core modules and thus introducing a very tight dependency to Hive in the core. 1,2,3 could be achieved via

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-10 Thread Zhang, Xuefu
Hi Fabian/Vno, Thank you very much for your encouragement inquiry. Sorry that I didn't see Fabian's email until I read Vino's response just now. (Somehow Fabian's went to the spam folder.) My proposal contains long-term and short-terms goals. Nevertheless, the effort will focus on the followin

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-10 Thread vino yang
Hi Xuefu, Appreciate this proposal, and like Fabian, it would look better if you can give more details of the plan. Thanks, vino. Fabian Hueske 于2018年10月10日周三 下午5:27写道: > Hi Xuefu, > > Welcome to the Flink community and thanks for starting this discussion! > Better Hive integration would be re

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

2018-10-10 Thread Fabian Hueske
Hi Xuefu, Welcome to the Flink community and thanks for starting this discussion! Better Hive integration would be really great! Can you go into details of what you are proposing? I can think of a couple ways to improve Flink in that regard: * Support for Hive UDFs * Support for Hive metadata cat