Would a couple folks raise their hand to make a review pass thru the 6 PRs listed above? It is a lovely stack of PRs that is 'all green' at the moment. I would be happy to open follow-on PRs to rapidly align with other efforts.
Note that the code is agnostic to the details of the ExternalCatalog interface; the code would not be obsolete if/when the catalog interface is enhanced as per the design doc. On Wed, Jan 2, 2019 at 1:35 PM Eron Wright <eronwri...@gmail.com> wrote: > I propose that the community review and merge the PRs that I posted, and > then evolve the design thru 1.8 and beyond. I think having a basic > infrastructure in place now will accelerate the effort, do you agree? > > Thanks again! > > On Wed, Jan 2, 2019 at 11:20 AM Zhang, Xuefu <xuef...@alibaba-inc.com> > wrote: > >> Hi Eron, >> >> Happy New Year! >> >> Thank you very much for your contribution, especially during the >> holidays. Wile I'm encouraged by your work, I'd also like to share my >> thoughts on how to move forward. >> >> First, please note that the design discussion is still finalizing, and we >> expect some moderate changes, especially around TableFactories. Another >> pending change is our decision to shy away from scala, which our work will >> be impacted by. >> >> Secondly, while your work seemed about plugging in catalogs definitions >> to the execution environment, which is less impacted by TableFactory >> change, I did notice some duplication of your work and ours. This is no big >> deal, but going forward, we should probable have a better communication on >> the work assignment so as to avoid any possible duplication of work. On the >> other hand, I think some of your work is interesting and valuable for >> inclusion once we finalize the overall design. >> >> Thus, please continue your research and experiment and let us know when >> you start working on anything so we can better coordinate. >> >> Thanks again for your interest and contributions. >> >> Thanks, >> Xuefu >> >> >> >> ------------------------------------------------------------------ >> From:Eron Wright <eronwri...@gmail.com> >> Sent At:2019 Jan. 1 (Tue.) 18:39 >> To:dev <dev@flink.apache.org>; Xuefu <xuef...@alibaba-inc.com> >> Cc:Xiaowei Jiang <xiaow...@gmail.com>; twalthr <twal...@apache.org>; >> piotr <pi...@data-artisans.com>; Fabian Hueske <fhue...@gmail.com>; >> suez1224 <suez1...@gmail.com>; Bowen Li <bowenl...@gmail.com> >> Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem >> >> Hi folks, there's clearly some incremental steps to be taken to introduce >> catalog support to SQL Client, complementary to what is proposed in the >> Flink-Hive Metastore design doc. I was quietly working on this over the >> holidays. I posted some new sub-tasks, PRs, and sample code >> to FLINK-10744. >> >> What inspired me to get involved is that the catalog interface seems like >> a great way to encapsulate a 'library' of Flink tables and functions. For >> example, the NYC Taxi dataset (TaxiRides, TaxiFares, various UDFs) may be >> nicely encapsulated as a catalog (TaxiData). Such a library should be >> fully consumable in SQL Client. >> >> I implemented the above. Some highlights: >> >> 1. A fully-worked example of using the Taxi dataset in SQL Client via an >> environment file. >> - an ASCII video showing the SQL Client in action: >> https://asciinema.org/a/C8xuAjmZSxCuApgFgZQyeIHuo >> >> - the corresponding environment file (will be even more concise once >> 'FLINK-10696 Catalog UDFs' is merged): >> *https://github.com/EronWright/flink-training-exercises/blob/3be008d64be975ced0f1a7e3901a8c5353f72a7e/src/main/dist/conf/sql-client-defaults.yaml >> <https://github.com/EronWright/flink-training-exercises/blob/3be008d64be975ced0f1a7e3901a8c5353f72a7e/src/main/dist/conf/sql-client-defaults.yaml>* >> >> - the typed API for standalone table applications: >> *https://github.com/EronWright/flink-training-exercises/blob/3be008d64be975ced0f1a7e3901a8c5353f72a7e/src/main/java/com/dataartisans/flinktraining/examples/table_java/examples/ViaCatalog.java#L50 >> <https://github.com/EronWright/flink-training-exercises/blob/3be008d64be975ced0f1a7e3901a8c5353f72a7e/src/main/java/com/dataartisans/flinktraining/examples/table_java/examples/ViaCatalog.java#L50>* >> >> 2. Implementation of the core catalog descriptor and factory. I realize >> that some renames may later occur as per the design doc, and would be happy >> to do that as a follow-up. >> https://github.com/apache/flink/pull/7390 >> >> 3. Implementation of a connect-style API on TableEnvironment to use >> catalog descriptor. >> https://github.com/apache/flink/pull/7392 >> >> 4. Integration into SQL-Client's environment file: >> https://github.com/apache/flink/pull/7393 >> >> I realize that the overall Hive integration is still evolving, but I >> believe that these PRs are a good stepping stone. Here's the list (in >> bottom-up order): >> - https://github.com/apache/flink/pull/7386 >> - https://github.com/apache/flink/pull/7388 >> - https://github.com/apache/flink/pull/7389 >> - https://github.com/apache/flink/pull/7390 >> - https://github.com/apache/flink/pull/7392 >> - https://github.com/apache/flink/pull/7393 >> >> Thanks and enjoy 2019! >> Eron W >> >> >> On Sun, Nov 18, 2018 at 3:04 PM Zhang, Xuefu <xuef...@alibaba-inc.com> >> wrote: >> Hi Xiaowei, >> >> Thanks for bringing up the question. In the current design, the >> properties for meta objects are meant to cover anything that's specific to >> a particular catalog and agnostic to Flink. Anything that is common (such >> as schema for tables, query text for views, and udf classname) are >> abstracted as members of the respective classes. However, this is still in >> discussion, and Timo and I will go over this and provide an update. >> >> Please note that UDF is a little more involved than what the current >> design doc shows. I'm still refining this part. >> >> Thanks, >> Xuefu >> >> >> ------------------------------------------------------------------ >> Sender:Xiaowei Jiang <xiaow...@gmail.com> >> Sent at:2018 Nov 18 (Sun) 15:17 >> Recipient:dev <dev@flink.apache.org> >> Cc:Xuefu <xuef...@alibaba-inc.com>; twalthr <twal...@apache.org>; piotr < >> pi...@data-artisans.com>; Fabian Hueske <fhue...@gmail.com>; suez1224 < >> suez1...@gmail.com> >> Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem >> >> Thanks Xuefu for the detailed design doc! One question on the properties >> associated with the catalog objects. Are we going to leave them completely >> free form or we are going to set some standard for that? I think that the >> answer may depend on if we want to explore catalog specific optimization >> opportunities. In any case, I think that it might be helpful for >> standardize as much as possible into strongly typed classes and use leave >> these properties for catalog specific things. But I think that we can do it >> in steps. >> >> Xiaowei >> On Fri, Nov 16, 2018 at 4:00 AM Bowen Li <bowenl...@gmail.com> wrote: >> Thanks for keeping on improving the overall design, Xuefu! It looks quite >> good to me now. >> >> Would be nice that cc-ed Flink committers can help to review and confirm! >> >> >> >> One minor suggestion: Since the last section of design doc already >> touches >> some new sql statements, shall we add another section in our doc and >> formalize the new sql statements in SQL Client and TableEnvironment that >> are gonna come along naturally with our design? Here are some that the >> design doc mentioned and some that I came up with: >> >> To be added: >> >> - USE <catalog> - set default catalog >> - USE <catalog.schema> - set default schema >> - SHOW CATALOGS - show all registered catalogs >> - SHOW SCHEMAS [FROM catalog] - list schemas in the current default >> catalog or the specified catalog >> - DESCRIBE VIEW view - show the view's definition in CatalogView >> - SHOW VIEWS [FROM schema/catalog.schema] - show views from current >> or a >> specified schema. >> >> (DDLs that can be addressed by either our design or Shuyi's DDL >> design) >> >> - CREATE/DROP/ALTER SCHEMA schema >> - CREATE/DROP/ALTER CATALOG catalog >> >> To be modified: >> >> - SHOW TABLES [FROM schema/catalog.schema] - show tables from current >> or >> a specified schema. Add 'from schema' to existing 'SHOW TABLES' >> statement >> - SHOW FUNCTIONS [FROM schema/catalog.schema] - show functions from >> current or a specified schema. Add 'from schema' to existing 'SHOW >> TABLES' >> statement' >> >> >> Thanks, Bowen >> >> >> >> On Wed, Nov 14, 2018 at 10:39 PM Zhang, Xuefu <xuef...@alibaba-inc.com> >> wrote: >> >> > Thanks, Bowen, for catching the error. I have granted comment >> permission >> > with the link. >> > >> > I also updated the doc with the latest class definitions. Everyone is >> > encouraged to review and comment. >> > >> > Thanks, >> > Xuefu >> > >> > ------------------------------------------------------------------ >> > Sender:Bowen Li <bowenl...@gmail.com> >> > Sent at:2018 Nov 14 (Wed) 06:44 >> > Recipient:Xuefu <xuef...@alibaba-inc.com> >> > Cc:piotr <pi...@data-artisans.com>; dev <dev@flink.apache.org>; Shuyi >> > Chen <suez1...@gmail.com> >> > Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem >> > >> > Hi Xuefu, >> > >> > Currently the new design doc >> > < >> https://docs.google.com/document/d/1Y9it78yaUvbv4g572ZK_lZnZaAGjqwM_EhjdOv4yJtw/edit >> > >> > is on “view only" mode, and people cannot leave comments. Can you >> please >> > change it to "can comment" or "can edit" mode? >> > >> > Thanks, Bowen >> > >> > >> > On Mon, Nov 12, 2018 at 9:51 PM Zhang, Xuefu <xuef...@alibaba-inc.com> >> > wrote: >> > Hi Piotr >> > >> > I have extracted the API portion of the design and the google doc is >> here >> > < >> https://docs.google.com/document/d/1Y9it78yaUvbv4g572ZK_lZnZaAGjqwM_EhjdOv4yJtw/edit?usp=sharing >> >. >> > Please review and provide your feedback. >> > >> > Thanks, >> > Xuefu >> > >> > ------------------------------------------------------------------ >> > Sender:Xuefu <xuef...@alibaba-inc.com> >> > Sent at:2018 Nov 12 (Mon) 12:43 >> > Recipient:Piotr Nowojski <pi...@data-artisans.com>; dev < >> > dev@flink.apache.org> >> > Cc:Bowen Li <bowenl...@gmail.com>; Shuyi Chen <suez1...@gmail.com> >> > Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem >> > >> > Hi Piotr, >> > >> > That sounds good to me. Let's close all the open questions ((there are >> a >> > couple of them)) in the Google doc and I should be able to quickly >> split >> > it into the three proposals as you suggested. >> > >> > Thanks, >> > Xuefu >> > >> > ------------------------------------------------------------------ >> > Sender:Piotr Nowojski <pi...@data-artisans.com> >> > Sent at:2018 Nov 9 (Fri) 22:46 >> > Recipient:dev <dev@flink.apache.org>; Xuefu <xuef...@alibaba-inc.com> >> > Cc:Bowen Li <bowenl...@gmail.com>; Shuyi Chen <suez1...@gmail.com> >> > Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem >> > >> > Hi, >> > >> > >> > Yes, it seems like the best solution. Maybe someone else can also >> suggests if we can split it further? Maybe changes in the interface in one >> doc, reading from hive meta store another and final storing our meta >> informations in hive meta store? >> > >> > Piotrek >> > >> > > On 9 Nov 2018, at 01:44, Zhang, Xuefu <xuef...@alibaba-inc.com> >> wrote: >> > > >> > > Hi Piotr, >> > > >> > > That seems to be good idea! >> > > >> > >> > > Since the google doc for the design is currently under extensive >> review, I will leave it as it is for now. However, I'll convert it to two >> different FLIPs when the time comes. >> > > >> > > How does it sound to you? >> > > >> > > Thanks, >> > > Xuefu >> > > >> > > >> > > ------------------------------------------------------------------ >> > > Sender:Piotr Nowojski <pi...@data-artisans.com> >> > > Sent at:2018 Nov 9 (Fri) 02:31 >> > > Recipient:dev <dev@flink.apache.org> >> > > Cc:Bowen Li <bowenl...@gmail.com>; Xuefu <xuef...@alibaba-inc.com >> > >; Shuyi Chen <suez1...@gmail.com> >> > > Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem >> > > >> > > Hi, >> > > >> > >> > > Maybe we should split this topic (and the design doc) into couple of >> smaller ones, hopefully independent. The questions that you have asked >> Fabian have for example very little to do with reading metadata from Hive >> Meta Store? >> > > >> > > Piotrek >> > > >> > >> On 7 Nov 2018, at 14:27, Fabian Hueske <fhue...@gmail.com> wrote: >> > >> >> > >> Hi Xuefu and all, >> > >> >> > >> Thanks for sharing this design document! >> > >> > >> I'm very much in favor of restructuring / reworking the catalog >> handling in >> > >> Flink SQL as outlined in the document. >> > >> > >> Most changes described in the design document seem to be rather >> general and >> > >> not specifically related to the Hive integration. >> > >> >> > >> > >> IMO, there are some aspects, especially those at the boundary of >> Hive and >> > >> Flink, that need a bit more discussion. For example >> > >> >> > >> * What does it take to make Flink schema compatible with Hive >> schema? >> > >> * How will Flink tables (descriptors) be stored in HMS? >> > >> * How do both Hive catalogs differ? Could they be integrated into >> to a >> > >> single one? When to use which one? >> > >> > >> * What meta information is provided by HMS? What of this can be >> leveraged >> > >> by Flink? >> > >> >> > >> Thank you, >> > >> Fabian >> > >> >> > >> Am Fr., 2. Nov. 2018 um 00:31 Uhr schrieb Bowen Li < >> bowenl...@gmail.com >> > >: >> > >> >> > >>> After taking a look at how other discussion threads work, I think >> it's >> > >>> actually fine just keep our discussion here. It's up to you, Xuefu. >> > >>> >> > >>> The google doc LGTM. I left some minor comments. >> > >>> >> > >>> On Thu, Nov 1, 2018 at 10:17 AM Bowen Li <bowenl...@gmail.com> >> wrote: >> > >>> >> > >>>> Hi all, >> > >>>> >> > >>>> As Xuefu has published the design doc on google, I agree with >> Shuyi's >> > >> > >>>> suggestion that we probably should start a new email thread like >> "[DISCUSS] >> > >> > >>>> ... Hive integration design ..." on only dev mailing list for >> community >> > >>>> devs to review. The current thread sends to both dev and user >> list. >> > >>>> >> > >> > >>>> This email thread is more like validating the general idea and >> direction >> > >> > >>>> with the community, and it's been pretty long and crowded so far. >> Since >> > >> > >>>> everyone is pro for the idea, we can move forward with another >> thread to >> > >>>> discuss and finalize the design. >> > >>>> >> > >>>> Thanks, >> > >>>> Bowen >> > >>>> >> > >>>> On Wed, Oct 31, 2018 at 12:16 PM Zhang, Xuefu < >> > xuef...@alibaba-inc.com> >> > >>>> wrote: >> > >>>> >> > >>>>> Hi Shuiyi, >> > >>>>> >> > >> > >>>>> Good idea. Actually the PDF was converted from a google doc. >> Here is its >> > >>>>> link: >> > >>>>> >> > >>>>> >> > >> https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing >> > >>>>> Once we reach an agreement, I can convert it to a FLIP. >> > >>>>> >> > >>>>> Thanks, >> > >>>>> Xuefu >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> ------------------------------------------------------------------ >> > >>>>> Sender:Shuyi Chen <suez1...@gmail.com> >> > >>>>> Sent at:2018 Nov 1 (Thu) 02:47 >> > >>>>> Recipient:Xuefu <xuef...@alibaba-inc.com> >> > >>>>> Cc:vino yang <yanghua1...@gmail.com>; Fabian Hueske < >> > fhue...@gmail.com>; >> > >>>>> dev <dev@flink.apache.org>; user <u...@flink.apache.org> >> > >>>>> Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive >> ecosystem >> > >>>>> >> > >>>>> Hi Xuefu, >> > >>>>> >> > >> > >>>>> Thanks a lot for driving this big effort. I would suggest >> convert your >> > >> > >>>>> proposal and design doc into a google doc, and share it on the >> dev mailing >> > >> > >>>>> list for the community to review and comment with title like >> "[DISCUSS] ... >> > >> > >>>>> Hive integration design ..." . Once approved, we can document >> it as a FLIP >> > >> > >>>>> (Flink Improvement Proposals), and use JIRAs to track the >> implementations. >> > >>>>> What do you think? >> > >>>>> >> > >>>>> Shuyi >> > >>>>> >> > >>>>> On Tue, Oct 30, 2018 at 11:32 AM Zhang, Xuefu < >> > xuef...@alibaba-inc.com> >> > >>>>> wrote: >> > >>>>> Hi all, >> > >>>>> >> > >>>>> I have also shared a design doc on Hive metastore integration >> that is >> > >> > >>>>> attached here and also to FLINK-10556[1]. Please kindly review >> and share >> > >>>>> your feedback. >> > >>>>> >> > >>>>> >> > >>>>> Thanks, >> > >>>>> Xuefu >> > >>>>> >> > >>>>> [1] https://issues.apache.org/jira/browse/FLINK-10556 >> > >>>>> >> ------------------------------------------------------------------ >> > >>>>> Sender:Xuefu <xuef...@alibaba-inc.com> >> > >>>>> Sent at:2018 Oct 25 (Thu) 01:08 >> > >>>>> Recipient:Xuefu <xuef...@alibaba-inc.com>; Shuyi Chen < >> > >>>>> suez1...@gmail.com> >> > >>>>> Cc:yanghua1127 <yanghua1...@gmail.com>; Fabian Hueske < >> > fhue...@gmail.com>; >> > >>>>> dev <dev@flink.apache.org>; user <u...@flink.apache.org> >> > >>>>> Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive >> ecosystem >> > >>>>> >> > >>>>> Hi all, >> > >>>>> >> > >>>>> To wrap up the discussion, I have attached a PDF describing the >> > >> > >>>>> proposal, which is also attached to FLINK-10556 [1]. Please feel >> free to >> > >>>>> watch that JIRA to track the progress. >> > >>>>> >> > >>>>> Please also let me know if you have additional comments or >> questions. >> > >>>>> >> > >>>>> Thanks, >> > >>>>> Xuefu >> > >>>>> >> > >>>>> [1] https://issues.apache.org/jira/browse/FLINK-10556 >> > >>>>> >> > >>>>> >> > >>>>> >> ------------------------------------------------------------------ >> > >>>>> Sender:Xuefu <xuef...@alibaba-inc.com> >> > >>>>> Sent at:2018 Oct 16 (Tue) 03:40 >> > >>>>> Recipient:Shuyi Chen <suez1...@gmail.com> >> > >>>>> Cc:yanghua1127 <yanghua1...@gmail.com>; Fabian Hueske < >> > fhue...@gmail.com>; >> > >>>>> dev <dev@flink.apache.org>; user <u...@flink.apache.org> >> > >>>>> Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive >> ecosystem >> > >>>>> >> > >>>>> Hi Shuyi, >> > >>>>> >> > >> > >>>>> Thank you for your input. Yes, I agreed with a phased approach >> and like >> > >> > >>>>> to move forward fast. :) We did some work internally on DDL >> utilizing babel >> > >>>>> parser in Calcite. While babel makes Calcite's grammar >> extensible, at >> > >>>>> first impression it still seems too cumbersome for a project >> when too >> > >> > >>>>> much extensions are made. It's even challenging to find where >> the extension >> > >> > >>>>> is needed! It would be certainly better if Calcite can magically >> support >> > >> > >>>>> Hive QL by just turning on a flag, such as that for MYSQL_5. I >> can also >> > >> > >>>>> see that this could mean a lot of work on Calcite. Nevertheless, >> I will >> > >> > >>>>> bring up the discussion over there and to see what their >> community thinks. >> > >>>>> >> > >>>>> Would mind to share more info about the proposal on DDL that you >> > >>>>> mentioned? We can certainly collaborate on this. >> > >>>>> >> > >>>>> Thanks, >> > >>>>> Xuefu >> > >>>>> >> > >>>>> >> ------------------------------------------------------------------ >> > >>>>> Sender:Shuyi Chen <suez1...@gmail.com> >> > >>>>> Sent at:2018 Oct 14 (Sun) 08:30 >> > >>>>> Recipient:Xuefu <xuef...@alibaba-inc.com> >> > >>>>> Cc:yanghua1127 <yanghua1...@gmail.com>; Fabian Hueske < >> > fhue...@gmail.com>; >> > >>>>> dev <dev@flink.apache.org>; user <u...@flink.apache.org> >> > >>>>> Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive >> ecosystem >> > >>>>> >> > >>>>> Welcome to the community and thanks for the great proposal, >> Xuefu! I >> > >> > >>>>> think the proposal can be divided into 2 stages: making Flink to >> support >> > >> > >>>>> Hive features, and make Hive to work with Flink. I agreed with >> Timo that on >> > >> > >>>>> starting with a smaller scope, so we can make progress faster. >> As for [6], >> > >> > >>>>> a proposal for DDL is already in progress, and will come after >> the unified >> > >> > >>>>> SQL connector API is done. For supporting Hive syntax, we might >> need to >> > >>>>> work with the Calcite community, and a recent effort called >> babel ( >> > >>>>> https://issues.apache.org/jira/browse/CALCITE-2280) in Calcite >> might >> > >>>>> help here. >> > >>>>> >> > >>>>> Thanks >> > >>>>> Shuyi >> > >>>>> >> > >>>>> On Wed, Oct 10, 2018 at 8:02 PM Zhang, Xuefu < >> > xuef...@alibaba-inc.com> >> > >>>>> wrote: >> > >>>>> Hi Fabian/Vno, >> > >>>>> >> > >> > >>>>> Thank you very much for your encouragement inquiry. Sorry that I >> didn't >> > >> > >>>>> see Fabian's email until I read Vino's response just now. >> (Somehow Fabian's >> > >>>>> went to the spam folder.) >> > >>>>> >> > >> > >>>>> My proposal contains long-term and short-terms goals. >> Nevertheless, the >> > >>>>> effort will focus on the following areas, including Fabian's >> list: >> > >>>>> >> > >>>>> 1. Hive metastore connectivity - This covers both read/write >> access, >> > >> > >>>>> which means Flink can make full use of Hive's metastore as its >> catalog (at >> > >>>>> least for the batch but can extend for streaming as well). >> > >> > >>>>> 2. Metadata compatibility - Objects (databases, tables, >> partitions, etc) >> > >> > >>>>> created by Hive can be understood by Flink and the reverse >> direction is >> > >>>>> true also. >> > >>>>> 3. Data compatibility - Similar to #2, data produced by Hive can >> be >> > >>>>> consumed by Flink and vise versa. >> > >> > >>>>> 4. Support Hive UDFs - For all Hive's native udfs, Flink either >> provides >> > >>>>> its own implementation or make Hive's implementation work in >> Flink. >> > >>>>> Further, for user created UDFs in Hive, Flink SQL should provide >> a >> > >> > >>>>> mechanism allowing user to import them into Flink without any >> code change >> > >>>>> required. >> > >>>>> 5. Data types - Flink SQL should support all data types that are >> > >>>>> available in Hive. >> > >>>>> 6. SQL Language - Flink SQL should support SQL standard (such as >> > >> > >>>>> SQL2003) with extension to support Hive's syntax and language >> features, >> > >>>>> around DDL, DML, and SELECT queries. >> > >> > >>>>> 7. SQL CLI - this is currently developing in Flink but more >> effort is >> > >>>>> needed. >> > >> > >>>>> 8. Server - provide a server that's compatible with Hive's >> HiverServer2 >> > >> > >>>>> in thrift APIs, such that HiveServer2 users can reuse their >> existing client >> > >>>>> (such as beeline) but connect to Flink's thrift server instead. >> > >> > >>>>> 9. JDBC/ODBC drivers - Flink may provide its own JDBC/ODBC >> drivers for >> > >>>>> other application to use to connect to its thrift server >> > >>>>> 10. Support other user's customizations in Hive, such as Hive >> Serdes, >> > >>>>> storage handlers, etc. >> > >> > >>>>> 11. Better task failure tolerance and task scheduling at Flink >> runtime. >> > >>>>> >> > >>>>> As you can see, achieving all those requires significant effort >> and >> > >> > >>>>> across all layers in Flink. However, a short-term goal could >> include only >> > >> > >>>>> core areas (such as 1, 2, 4, 5, 6, 7) or start at a smaller >> scope (such as >> > >>>>> #3, #6). >> > >>>>> >> > >> > >>>>> Please share your further thoughts. If we generally agree that >> this is >> > >> > >>>>> the right direction, I could come up with a formal proposal >> quickly and >> > >>>>> then we can follow up with broader discussions. >> > >>>>> >> > >>>>> Thanks, >> > >>>>> Xuefu >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> ------------------------------------------------------------------ >> > >>>>> Sender:vino yang <yanghua1...@gmail.com> >> > >>>>> Sent at:2018 Oct 11 (Thu) 09:45 >> > >>>>> Recipient:Fabian Hueske <fhue...@gmail.com> >> > >>>>> Cc:dev <dev@flink.apache.org>; Xuefu <xuef...@alibaba-inc.com >> > >; user < >> > >>>>> u...@flink.apache.org> >> > >>>>> Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive >> ecosystem >> > >>>>> >> > >>>>> Hi Xuefu, >> > >>>>> >> > >> > >>>>> Appreciate this proposal, and like Fabian, it would look better >> if you >> > >>>>> can give more details of the plan. >> > >>>>> >> > >>>>> Thanks, vino. >> > >>>>> >> > >>>>> Fabian Hueske <fhue...@gmail.com> 于2018年10月10日周三 下午5:27写道: >> > >>>>> Hi Xuefu, >> > >>>>> >> > >> > >>>>> Welcome to the Flink community and thanks for starting this >> discussion! >> > >>>>> Better Hive integration would be really great! >> > >>>>> Can you go into details of what you are proposing? I can think >> of a >> > >>>>> couple ways to improve Flink in that regard: >> > >>>>> >> > >>>>> * Support for Hive UDFs >> > >>>>> * Support for Hive metadata catalog >> > >>>>> * Support for HiveQL syntax >> > >>>>> * ??? >> > >>>>> >> > >>>>> Best, Fabian >> > >>>>> >> > >>>>> Am Di., 9. Okt. 2018 um 19:22 Uhr schrieb Zhang, Xuefu < >> > >>>>> xuef...@alibaba-inc.com>: >> > >>>>> Hi all, >> > >>>>> >> > >>>>> Along with the community's effort, inside Alibaba we have >> explored >> > >> > >>>>> Flink's potential as an execution engine not just for stream >> processing but >> > >>>>> also for batch processing. We are encouraged by our findings and >> have >> > >> > >>>>> initiated our effort to make Flink's SQL capabilities >> full-fledged. When >> > >> > >>>>> comparing what's available in Flink to the offerings from >> competitive data >> > >> > >>>>> processing engines, we identified a major gap in Flink: a well >> integration >> > >> > >>>>> with Hive ecosystem. This is crucial to the success of Flink SQL >> and batch >> > >> > >>>>> due to the well-established data ecosystem around Hive. >> Therefore, we have >> > >> > >>>>> done some initial work along this direction but there are still >> a lot of >> > >>>>> effort needed. >> > >>>>> >> > >>>>> We have two strategies in mind. The first one is to make Flink >> SQL >> > >> > >>>>> full-fledged and well-integrated with Hive ecosystem. This is a >> similar >> > >> > >>>>> approach to what Spark SQL adopted. The second strategy is to >> make Hive >> > >> > >>>>> itself work with Flink, similar to the proposal in [1]. Each >> approach bears >> > >> > >>>>> its pros and cons, but they don’t need to be mutually exclusive >> with each >> > >>>>> targeting at different users and use cases. We believe that both >> will >> > >>>>> promote a much greater adoption of Flink beyond stream >> processing. >> > >>>>> >> > >>>>> We have been focused on the first approach and would like to >> showcase >> > >> > >>>>> Flink's batch and SQL capabilities with Flink SQL. However, we >> have also >> > >>>>> planned to start strategy #2 as the follow-up effort. >> > >>>>> >> > >> > >>>>> I'm completely new to Flink(, with a short bio [2] below), >> though many >> > >> > >>>>> of my colleagues here at Alibaba are long-time contributors. >> Nevertheless, >> > >> > >>>>> I'd like to share our thoughts and invite your early feedback. >> At the same >> > >> > >>>>> time, I am working on a detailed proposal on Flink SQL's >> integration with >> > >>>>> Hive ecosystem, which will be also shared when ready. >> > >>>>> >> > >>>>> While the ideas are simple, each approach will demand significant >> > >> > >>>>> effort, more than what we can afford. Thus, the input and >> contributions >> > >>>>> from the communities are greatly welcome and appreciated. >> > >>>>> >> > >>>>> Regards, >> > >>>>> >> > >>>>> >> > >>>>> Xuefu >> > >>>>> >> > >>>>> References: >> > >>>>> >> > >>>>> [1] https://issues.apache.org/jira/browse/HIVE-10712 >> > >> > >>>>> [2] Xuefu Zhang is a long-time open source veteran, worked or >> working on >> > >>>>> many projects under Apache Foundation, of which he is also an >> honored >> > >> > >>>>> member. About 10 years ago he worked in the Hadoop team at Yahoo >> where the >> > >> > >>>>> projects just got started. Later he worked at Cloudera, >> initiating and >> > >> > >>>>> leading the development of Hive on Spark project in the >> communities and >> > >> > >>>>> across many organizations. Prior to joining Alibaba, he worked >> at Uber >> > >> > >>>>> where he promoted Hive on Spark to all Uber's SQL on Hadoop >> workload and >> > >>>>> significantly improved Uber's cluster efficiency. >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> -- >> > >> > >>>>> "So you have to trust that the dots will somehow connect in your >> future." >> > >>>>> >> > >>>>> >> > >>>>> -- >> > >> > >>>>> "So you have to trust that the dots will somehow connect in your >> future." >> > >>>>> >> > >> > >> >>