I propose that the community review and merge the PRs that I posted, and then evolve the design thru 1.8 and beyond. I think having a basic infrastructure in place now will accelerate the effort, do you agree?
Thanks again! On Wed, Jan 2, 2019 at 11:20 AM Zhang, Xuefu <xuef...@alibaba-inc.com> wrote: > Hi Eron, > > Happy New Year! > > Thank you very much for your contribution, especially during the holidays. > Wile I'm encouraged by your work, I'd also like to share my thoughts on how > to move forward. > > First, please note that the design discussion is still finalizing, and we > expect some moderate changes, especially around TableFactories. Another > pending change is our decision to shy away from scala, which our work will > be impacted by. > > Secondly, while your work seemed about plugging in catalogs definitions to > the execution environment, which is less impacted by TableFactory change, I > did notice some duplication of your work and ours. This is no big deal, but > going forward, we should probable have a better communication on the work > assignment so as to avoid any possible duplication of work. On the other > hand, I think some of your work is interesting and valuable for inclusion > once we finalize the overall design. > > Thus, please continue your research and experiment and let us know when > you start working on anything so we can better coordinate. > > Thanks again for your interest and contributions. > > Thanks, > Xuefu > > > > ------------------------------------------------------------------ > From:Eron Wright <eronwri...@gmail.com> > Sent At:2019 Jan. 1 (Tue.) 18:39 > To:dev <dev@flink.apache.org>; Xuefu <xuef...@alibaba-inc.com> > Cc:Xiaowei Jiang <xiaow...@gmail.com>; twalthr <twal...@apache.org>; > piotr <pi...@data-artisans.com>; Fabian Hueske <fhue...@gmail.com>; > suez1224 <suez1...@gmail.com>; Bowen Li <bowenl...@gmail.com> > Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem > > Hi folks, there's clearly some incremental steps to be taken to introduce > catalog support to SQL Client, complementary to what is proposed in the > Flink-Hive Metastore design doc. I was quietly working on this over the > holidays. I posted some new sub-tasks, PRs, and sample code > to FLINK-10744. > > What inspired me to get involved is that the catalog interface seems like > a great way to encapsulate a 'library' of Flink tables and functions. For > example, the NYC Taxi dataset (TaxiRides, TaxiFares, various UDFs) may be > nicely encapsulated as a catalog (TaxiData). Such a library should be > fully consumable in SQL Client. > > I implemented the above. Some highlights: > > 1. A fully-worked example of using the Taxi dataset in SQL Client via an > environment file. > - an ASCII video showing the SQL Client in action: > https://asciinema.org/a/C8xuAjmZSxCuApgFgZQyeIHuo > > - the corresponding environment file (will be even more concise once > 'FLINK-10696 Catalog UDFs' is merged): > *https://github.com/EronWright/flink-training-exercises/blob/3be008d64be975ced0f1a7e3901a8c5353f72a7e/src/main/dist/conf/sql-client-defaults.yaml > <https://github.com/EronWright/flink-training-exercises/blob/3be008d64be975ced0f1a7e3901a8c5353f72a7e/src/main/dist/conf/sql-client-defaults.yaml>* > > - the typed API for standalone table applications: > *https://github.com/EronWright/flink-training-exercises/blob/3be008d64be975ced0f1a7e3901a8c5353f72a7e/src/main/java/com/dataartisans/flinktraining/examples/table_java/examples/ViaCatalog.java#L50 > <https://github.com/EronWright/flink-training-exercises/blob/3be008d64be975ced0f1a7e3901a8c5353f72a7e/src/main/java/com/dataartisans/flinktraining/examples/table_java/examples/ViaCatalog.java#L50>* > > 2. Implementation of the core catalog descriptor and factory. I realize > that some renames may later occur as per the design doc, and would be happy > to do that as a follow-up. > https://github.com/apache/flink/pull/7390 > > 3. Implementation of a connect-style API on TableEnvironment to use > catalog descriptor. > https://github.com/apache/flink/pull/7392 > > 4. Integration into SQL-Client's environment file: > https://github.com/apache/flink/pull/7393 > > I realize that the overall Hive integration is still evolving, but I > believe that these PRs are a good stepping stone. Here's the list (in > bottom-up order): > - https://github.com/apache/flink/pull/7386 > - https://github.com/apache/flink/pull/7388 > - https://github.com/apache/flink/pull/7389 > - https://github.com/apache/flink/pull/7390 > - https://github.com/apache/flink/pull/7392 > - https://github.com/apache/flink/pull/7393 > > Thanks and enjoy 2019! > Eron W > > > On Sun, Nov 18, 2018 at 3:04 PM Zhang, Xuefu <xuef...@alibaba-inc.com> > wrote: > Hi Xiaowei, > > Thanks for bringing up the question. In the current design, the properties > for meta objects are meant to cover anything that's specific to a > particular catalog and agnostic to Flink. Anything that is common (such as > schema for tables, query text for views, and udf classname) are abstracted > as members of the respective classes. However, this is still in discussion, > and Timo and I will go over this and provide an update. > > Please note that UDF is a little more involved than what the current > design doc shows. I'm still refining this part. > > Thanks, > Xuefu > > > ------------------------------------------------------------------ > Sender:Xiaowei Jiang <xiaow...@gmail.com> > Sent at:2018 Nov 18 (Sun) 15:17 > Recipient:dev <dev@flink.apache.org> > Cc:Xuefu <xuef...@alibaba-inc.com>; twalthr <twal...@apache.org>; piotr < > pi...@data-artisans.com>; Fabian Hueske <fhue...@gmail.com>; suez1224 < > suez1...@gmail.com> > Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem > > Thanks Xuefu for the detailed design doc! One question on the properties > associated with the catalog objects. Are we going to leave them completely > free form or we are going to set some standard for that? I think that the > answer may depend on if we want to explore catalog specific optimization > opportunities. In any case, I think that it might be helpful for > standardize as much as possible into strongly typed classes and use leave > these properties for catalog specific things. But I think that we can do it > in steps. > > Xiaowei > On Fri, Nov 16, 2018 at 4:00 AM Bowen Li <bowenl...@gmail.com> wrote: > Thanks for keeping on improving the overall design, Xuefu! It looks quite > good to me now. > > Would be nice that cc-ed Flink committers can help to review and confirm! > > > > One minor suggestion: Since the last section of design doc already touches > some new sql statements, shall we add another section in our doc and > formalize the new sql statements in SQL Client and TableEnvironment that > are gonna come along naturally with our design? Here are some that the > design doc mentioned and some that I came up with: > > To be added: > > - USE <catalog> - set default catalog > - USE <catalog.schema> - set default schema > - SHOW CATALOGS - show all registered catalogs > - SHOW SCHEMAS [FROM catalog] - list schemas in the current default > catalog or the specified catalog > - DESCRIBE VIEW view - show the view's definition in CatalogView > - SHOW VIEWS [FROM schema/catalog.schema] - show views from current or > a > specified schema. > > (DDLs that can be addressed by either our design or Shuyi's DDL design) > > - CREATE/DROP/ALTER SCHEMA schema > - CREATE/DROP/ALTER CATALOG catalog > > To be modified: > > - SHOW TABLES [FROM schema/catalog.schema] - show tables from current > or > a specified schema. Add 'from schema' to existing 'SHOW TABLES' > statement > - SHOW FUNCTIONS [FROM schema/catalog.schema] - show functions from > current or a specified schema. Add 'from schema' to existing 'SHOW > TABLES' > statement' > > > Thanks, Bowen > > > > On Wed, Nov 14, 2018 at 10:39 PM Zhang, Xuefu <xuef...@alibaba-inc.com> > wrote: > > > Thanks, Bowen, for catching the error. I have granted comment permission > > with the link. > > > > I also updated the doc with the latest class definitions. Everyone is > > encouraged to review and comment. > > > > Thanks, > > Xuefu > > > > ------------------------------------------------------------------ > > Sender:Bowen Li <bowenl...@gmail.com> > > Sent at:2018 Nov 14 (Wed) 06:44 > > Recipient:Xuefu <xuef...@alibaba-inc.com> > > Cc:piotr <pi...@data-artisans.com>; dev <dev@flink.apache.org>; Shuyi > > Chen <suez1...@gmail.com> > > Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem > > > > Hi Xuefu, > > > > Currently the new design doc > > < > https://docs.google.com/document/d/1Y9it78yaUvbv4g572ZK_lZnZaAGjqwM_EhjdOv4yJtw/edit > > > > is on “view only" mode, and people cannot leave comments. Can you please > > change it to "can comment" or "can edit" mode? > > > > Thanks, Bowen > > > > > > On Mon, Nov 12, 2018 at 9:51 PM Zhang, Xuefu <xuef...@alibaba-inc.com> > > wrote: > > Hi Piotr > > > > I have extracted the API portion of the design and the google doc is > here > > < > https://docs.google.com/document/d/1Y9it78yaUvbv4g572ZK_lZnZaAGjqwM_EhjdOv4yJtw/edit?usp=sharing > >. > > Please review and provide your feedback. > > > > Thanks, > > Xuefu > > > > ------------------------------------------------------------------ > > Sender:Xuefu <xuef...@alibaba-inc.com> > > Sent at:2018 Nov 12 (Mon) 12:43 > > Recipient:Piotr Nowojski <pi...@data-artisans.com>; dev < > > dev@flink.apache.org> > > Cc:Bowen Li <bowenl...@gmail.com>; Shuyi Chen <suez1...@gmail.com> > > Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem > > > > Hi Piotr, > > > > That sounds good to me. Let's close all the open questions ((there are a > > couple of them)) in the Google doc and I should be able to quickly split > > it into the three proposals as you suggested. > > > > Thanks, > > Xuefu > > > > ------------------------------------------------------------------ > > Sender:Piotr Nowojski <pi...@data-artisans.com> > > Sent at:2018 Nov 9 (Fri) 22:46 > > Recipient:dev <dev@flink.apache.org>; Xuefu <xuef...@alibaba-inc.com> > > Cc:Bowen Li <bowenl...@gmail.com>; Shuyi Chen <suez1...@gmail.com> > > Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem > > > > Hi, > > > > > > Yes, it seems like the best solution. Maybe someone else can also > suggests if we can split it further? Maybe changes in the interface in one > doc, reading from hive meta store another and final storing our meta > informations in hive meta store? > > > > Piotrek > > > > > On 9 Nov 2018, at 01:44, Zhang, Xuefu <xuef...@alibaba-inc.com> > wrote: > > > > > > Hi Piotr, > > > > > > That seems to be good idea! > > > > > > > > Since the google doc for the design is currently under extensive > review, I will leave it as it is for now. However, I'll convert it to two > different FLIPs when the time comes. > > > > > > How does it sound to you? > > > > > > Thanks, > > > Xuefu > > > > > > > > > ------------------------------------------------------------------ > > > Sender:Piotr Nowojski <pi...@data-artisans.com> > > > Sent at:2018 Nov 9 (Fri) 02:31 > > > Recipient:dev <dev@flink.apache.org> > > > Cc:Bowen Li <bowenl...@gmail.com>; Xuefu <xuef...@alibaba-inc.com > > >; Shuyi Chen <suez1...@gmail.com> > > > Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem > > > > > > Hi, > > > > > > > > Maybe we should split this topic (and the design doc) into couple of > smaller ones, hopefully independent. The questions that you have asked > Fabian have for example very little to do with reading metadata from Hive > Meta Store? > > > > > > Piotrek > > > > > >> On 7 Nov 2018, at 14:27, Fabian Hueske <fhue...@gmail.com> wrote: > > >> > > >> Hi Xuefu and all, > > >> > > >> Thanks for sharing this design document! > > > > >> I'm very much in favor of restructuring / reworking the catalog > handling in > > >> Flink SQL as outlined in the document. > > > > >> Most changes described in the design document seem to be rather > general and > > >> not specifically related to the Hive integration. > > >> > > > > >> IMO, there are some aspects, especially those at the boundary of > Hive and > > >> Flink, that need a bit more discussion. For example > > >> > > >> * What does it take to make Flink schema compatible with Hive schema? > > >> * How will Flink tables (descriptors) be stored in HMS? > > >> * How do both Hive catalogs differ? Could they be integrated into to > a > > >> single one? When to use which one? > > > > >> * What meta information is provided by HMS? What of this can be > leveraged > > >> by Flink? > > >> > > >> Thank you, > > >> Fabian > > >> > > >> Am Fr., 2. Nov. 2018 um 00:31 Uhr schrieb Bowen Li < > bowenl...@gmail.com > > >: > > >> > > >>> After taking a look at how other discussion threads work, I think > it's > > >>> actually fine just keep our discussion here. It's up to you, Xuefu. > > >>> > > >>> The google doc LGTM. I left some minor comments. > > >>> > > >>> On Thu, Nov 1, 2018 at 10:17 AM Bowen Li <bowenl...@gmail.com> > wrote: > > >>> > > >>>> Hi all, > > >>>> > > >>>> As Xuefu has published the design doc on google, I agree with > Shuyi's > > > > >>>> suggestion that we probably should start a new email thread like > "[DISCUSS] > > > > >>>> ... Hive integration design ..." on only dev mailing list for > community > > >>>> devs to review. The current thread sends to both dev and user list. > > >>>> > > > > >>>> This email thread is more like validating the general idea and > direction > > > > >>>> with the community, and it's been pretty long and crowded so far. > Since > > > > >>>> everyone is pro for the idea, we can move forward with another > thread to > > >>>> discuss and finalize the design. > > >>>> > > >>>> Thanks, > > >>>> Bowen > > >>>> > > >>>> On Wed, Oct 31, 2018 at 12:16 PM Zhang, Xuefu < > > xuef...@alibaba-inc.com> > > >>>> wrote: > > >>>> > > >>>>> Hi Shuiyi, > > >>>>> > > > > >>>>> Good idea. Actually the PDF was converted from a google doc. Here > is its > > >>>>> link: > > >>>>> > > >>>>> > > > https://docs.google.com/document/d/1SkppRD_rE3uOKSN-LuZCqn4f7dz0zW5aa6T_hBZq5_o/edit?usp=sharing > > >>>>> Once we reach an agreement, I can convert it to a FLIP. > > >>>>> > > >>>>> Thanks, > > >>>>> Xuefu > > >>>>> > > >>>>> > > >>>>> > > >>>>> ------------------------------------------------------------------ > > >>>>> Sender:Shuyi Chen <suez1...@gmail.com> > > >>>>> Sent at:2018 Nov 1 (Thu) 02:47 > > >>>>> Recipient:Xuefu <xuef...@alibaba-inc.com> > > >>>>> Cc:vino yang <yanghua1...@gmail.com>; Fabian Hueske < > > fhue...@gmail.com>; > > >>>>> dev <dev@flink.apache.org>; user <u...@flink.apache.org> > > >>>>> Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem > > >>>>> > > >>>>> Hi Xuefu, > > >>>>> > > > > >>>>> Thanks a lot for driving this big effort. I would suggest convert > your > > > > >>>>> proposal and design doc into a google doc, and share it on the > dev mailing > > > > >>>>> list for the community to review and comment with title like > "[DISCUSS] ... > > > > >>>>> Hive integration design ..." . Once approved, we can document it > as a FLIP > > > > >>>>> (Flink Improvement Proposals), and use JIRAs to track the > implementations. > > >>>>> What do you think? > > >>>>> > > >>>>> Shuyi > > >>>>> > > >>>>> On Tue, Oct 30, 2018 at 11:32 AM Zhang, Xuefu < > > xuef...@alibaba-inc.com> > > >>>>> wrote: > > >>>>> Hi all, > > >>>>> > > >>>>> I have also shared a design doc on Hive metastore integration > that is > > > > >>>>> attached here and also to FLINK-10556[1]. Please kindly review > and share > > >>>>> your feedback. > > >>>>> > > >>>>> > > >>>>> Thanks, > > >>>>> Xuefu > > >>>>> > > >>>>> [1] https://issues.apache.org/jira/browse/FLINK-10556 > > >>>>> ------------------------------------------------------------------ > > >>>>> Sender:Xuefu <xuef...@alibaba-inc.com> > > >>>>> Sent at:2018 Oct 25 (Thu) 01:08 > > >>>>> Recipient:Xuefu <xuef...@alibaba-inc.com>; Shuyi Chen < > > >>>>> suez1...@gmail.com> > > >>>>> Cc:yanghua1127 <yanghua1...@gmail.com>; Fabian Hueske < > > fhue...@gmail.com>; > > >>>>> dev <dev@flink.apache.org>; user <u...@flink.apache.org> > > >>>>> Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem > > >>>>> > > >>>>> Hi all, > > >>>>> > > >>>>> To wrap up the discussion, I have attached a PDF describing the > > > > >>>>> proposal, which is also attached to FLINK-10556 [1]. Please feel > free to > > >>>>> watch that JIRA to track the progress. > > >>>>> > > >>>>> Please also let me know if you have additional comments or > questions. > > >>>>> > > >>>>> Thanks, > > >>>>> Xuefu > > >>>>> > > >>>>> [1] https://issues.apache.org/jira/browse/FLINK-10556 > > >>>>> > > >>>>> > > >>>>> ------------------------------------------------------------------ > > >>>>> Sender:Xuefu <xuef...@alibaba-inc.com> > > >>>>> Sent at:2018 Oct 16 (Tue) 03:40 > > >>>>> Recipient:Shuyi Chen <suez1...@gmail.com> > > >>>>> Cc:yanghua1127 <yanghua1...@gmail.com>; Fabian Hueske < > > fhue...@gmail.com>; > > >>>>> dev <dev@flink.apache.org>; user <u...@flink.apache.org> > > >>>>> Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem > > >>>>> > > >>>>> Hi Shuyi, > > >>>>> > > > > >>>>> Thank you for your input. Yes, I agreed with a phased approach > and like > > > > >>>>> to move forward fast. :) We did some work internally on DDL > utilizing babel > > >>>>> parser in Calcite. While babel makes Calcite's grammar > extensible, at > > >>>>> first impression it still seems too cumbersome for a project when > too > > > > >>>>> much extensions are made. It's even challenging to find where the > extension > > > > >>>>> is needed! It would be certainly better if Calcite can magically > support > > > > >>>>> Hive QL by just turning on a flag, such as that for MYSQL_5. I > can also > > > > >>>>> see that this could mean a lot of work on Calcite. Nevertheless, > I will > > > > >>>>> bring up the discussion over there and to see what their > community thinks. > > >>>>> > > >>>>> Would mind to share more info about the proposal on DDL that you > > >>>>> mentioned? We can certainly collaborate on this. > > >>>>> > > >>>>> Thanks, > > >>>>> Xuefu > > >>>>> > > >>>>> ------------------------------------------------------------------ > > >>>>> Sender:Shuyi Chen <suez1...@gmail.com> > > >>>>> Sent at:2018 Oct 14 (Sun) 08:30 > > >>>>> Recipient:Xuefu <xuef...@alibaba-inc.com> > > >>>>> Cc:yanghua1127 <yanghua1...@gmail.com>; Fabian Hueske < > > fhue...@gmail.com>; > > >>>>> dev <dev@flink.apache.org>; user <u...@flink.apache.org> > > >>>>> Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem > > >>>>> > > >>>>> Welcome to the community and thanks for the great proposal, > Xuefu! I > > > > >>>>> think the proposal can be divided into 2 stages: making Flink to > support > > > > >>>>> Hive features, and make Hive to work with Flink. I agreed with > Timo that on > > > > >>>>> starting with a smaller scope, so we can make progress faster. As > for [6], > > > > >>>>> a proposal for DDL is already in progress, and will come after > the unified > > > > >>>>> SQL connector API is done. For supporting Hive syntax, we might > need to > > >>>>> work with the Calcite community, and a recent effort called babel > ( > > >>>>> https://issues.apache.org/jira/browse/CALCITE-2280) in Calcite > might > > >>>>> help here. > > >>>>> > > >>>>> Thanks > > >>>>> Shuyi > > >>>>> > > >>>>> On Wed, Oct 10, 2018 at 8:02 PM Zhang, Xuefu < > > xuef...@alibaba-inc.com> > > >>>>> wrote: > > >>>>> Hi Fabian/Vno, > > >>>>> > > > > >>>>> Thank you very much for your encouragement inquiry. Sorry that I > didn't > > > > >>>>> see Fabian's email until I read Vino's response just now. > (Somehow Fabian's > > >>>>> went to the spam folder.) > > >>>>> > > > > >>>>> My proposal contains long-term and short-terms goals. > Nevertheless, the > > >>>>> effort will focus on the following areas, including Fabian's list: > > >>>>> > > >>>>> 1. Hive metastore connectivity - This covers both read/write > access, > > > > >>>>> which means Flink can make full use of Hive's metastore as its > catalog (at > > >>>>> least for the batch but can extend for streaming as well). > > > > >>>>> 2. Metadata compatibility - Objects (databases, tables, > partitions, etc) > > > > >>>>> created by Hive can be understood by Flink and the reverse > direction is > > >>>>> true also. > > >>>>> 3. Data compatibility - Similar to #2, data produced by Hive can > be > > >>>>> consumed by Flink and vise versa. > > > > >>>>> 4. Support Hive UDFs - For all Hive's native udfs, Flink either > provides > > >>>>> its own implementation or make Hive's implementation work in > Flink. > > >>>>> Further, for user created UDFs in Hive, Flink SQL should provide a > > > > >>>>> mechanism allowing user to import them into Flink without any > code change > > >>>>> required. > > >>>>> 5. Data types - Flink SQL should support all data types that are > > >>>>> available in Hive. > > >>>>> 6. SQL Language - Flink SQL should support SQL standard (such as > > > > >>>>> SQL2003) with extension to support Hive's syntax and language > features, > > >>>>> around DDL, DML, and SELECT queries. > > > > >>>>> 7. SQL CLI - this is currently developing in Flink but more > effort is > > >>>>> needed. > > > > >>>>> 8. Server - provide a server that's compatible with Hive's > HiverServer2 > > > > >>>>> in thrift APIs, such that HiveServer2 users can reuse their > existing client > > >>>>> (such as beeline) but connect to Flink's thrift server instead. > > > > >>>>> 9. JDBC/ODBC drivers - Flink may provide its own JDBC/ODBC > drivers for > > >>>>> other application to use to connect to its thrift server > > >>>>> 10. Support other user's customizations in Hive, such as Hive > Serdes, > > >>>>> storage handlers, etc. > > > > >>>>> 11. Better task failure tolerance and task scheduling at Flink > runtime. > > >>>>> > > >>>>> As you can see, achieving all those requires significant effort > and > > > > >>>>> across all layers in Flink. However, a short-term goal could > include only > > > > >>>>> core areas (such as 1, 2, 4, 5, 6, 7) or start at a smaller > scope (such as > > >>>>> #3, #6). > > >>>>> > > > > >>>>> Please share your further thoughts. If we generally agree that > this is > > > > >>>>> the right direction, I could come up with a formal proposal > quickly and > > >>>>> then we can follow up with broader discussions. > > >>>>> > > >>>>> Thanks, > > >>>>> Xuefu > > >>>>> > > >>>>> > > >>>>> > > >>>>> ------------------------------------------------------------------ > > >>>>> Sender:vino yang <yanghua1...@gmail.com> > > >>>>> Sent at:2018 Oct 11 (Thu) 09:45 > > >>>>> Recipient:Fabian Hueske <fhue...@gmail.com> > > >>>>> Cc:dev <dev@flink.apache.org>; Xuefu <xuef...@alibaba-inc.com > > >; user < > > >>>>> u...@flink.apache.org> > > >>>>> Subject:Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem > > >>>>> > > >>>>> Hi Xuefu, > > >>>>> > > > > >>>>> Appreciate this proposal, and like Fabian, it would look better > if you > > >>>>> can give more details of the plan. > > >>>>> > > >>>>> Thanks, vino. > > >>>>> > > >>>>> Fabian Hueske <fhue...@gmail.com> 于2018年10月10日周三 下午5:27写道: > > >>>>> Hi Xuefu, > > >>>>> > > > > >>>>> Welcome to the Flink community and thanks for starting this > discussion! > > >>>>> Better Hive integration would be really great! > > >>>>> Can you go into details of what you are proposing? I can think of > a > > >>>>> couple ways to improve Flink in that regard: > > >>>>> > > >>>>> * Support for Hive UDFs > > >>>>> * Support for Hive metadata catalog > > >>>>> * Support for HiveQL syntax > > >>>>> * ??? > > >>>>> > > >>>>> Best, Fabian > > >>>>> > > >>>>> Am Di., 9. Okt. 2018 um 19:22 Uhr schrieb Zhang, Xuefu < > > >>>>> xuef...@alibaba-inc.com>: > > >>>>> Hi all, > > >>>>> > > >>>>> Along with the community's effort, inside Alibaba we have explored > > > > >>>>> Flink's potential as an execution engine not just for stream > processing but > > >>>>> also for batch processing. We are encouraged by our findings and > have > > > > >>>>> initiated our effort to make Flink's SQL capabilities > full-fledged. When > > > > >>>>> comparing what's available in Flink to the offerings from > competitive data > > > > >>>>> processing engines, we identified a major gap in Flink: a well > integration > > > > >>>>> with Hive ecosystem. This is crucial to the success of Flink SQL > and batch > > > > >>>>> due to the well-established data ecosystem around Hive. > Therefore, we have > > > > >>>>> done some initial work along this direction but there are still a > lot of > > >>>>> effort needed. > > >>>>> > > >>>>> We have two strategies in mind. The first one is to make Flink SQL > > > > >>>>> full-fledged and well-integrated with Hive ecosystem. This is a > similar > > > > >>>>> approach to what Spark SQL adopted. The second strategy is to > make Hive > > > > >>>>> itself work with Flink, similar to the proposal in [1]. Each > approach bears > > > > >>>>> its pros and cons, but they don’t need to be mutually exclusive > with each > > >>>>> targeting at different users and use cases. We believe that both > will > > >>>>> promote a much greater adoption of Flink beyond stream processing. > > >>>>> > > >>>>> We have been focused on the first approach and would like to > showcase > > > > >>>>> Flink's batch and SQL capabilities with Flink SQL. However, we > have also > > >>>>> planned to start strategy #2 as the follow-up effort. > > >>>>> > > > > >>>>> I'm completely new to Flink(, with a short bio [2] below), though > many > > > > >>>>> of my colleagues here at Alibaba are long-time contributors. > Nevertheless, > > > > >>>>> I'd like to share our thoughts and invite your early feedback. At > the same > > > > >>>>> time, I am working on a detailed proposal on Flink SQL's > integration with > > >>>>> Hive ecosystem, which will be also shared when ready. > > >>>>> > > >>>>> While the ideas are simple, each approach will demand significant > > > > >>>>> effort, more than what we can afford. Thus, the input and > contributions > > >>>>> from the communities are greatly welcome and appreciated. > > >>>>> > > >>>>> Regards, > > >>>>> > > >>>>> > > >>>>> Xuefu > > >>>>> > > >>>>> References: > > >>>>> > > >>>>> [1] https://issues.apache.org/jira/browse/HIVE-10712 > > > > >>>>> [2] Xuefu Zhang is a long-time open source veteran, worked or > working on > > >>>>> many projects under Apache Foundation, of which he is also an > honored > > > > >>>>> member. About 10 years ago he worked in the Hadoop team at Yahoo > where the > > > > >>>>> projects just got started. Later he worked at Cloudera, > initiating and > > > > >>>>> leading the development of Hive on Spark project in the > communities and > > > > >>>>> across many organizations. Prior to joining Alibaba, he worked at > Uber > > > > >>>>> where he promoted Hive on Spark to all Uber's SQL on Hadoop > workload and > > >>>>> significantly improved Uber's cluster efficiency. > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>> -- > > > > >>>>> "So you have to trust that the dots will somehow connect in your > future." > > >>>>> > > >>>>> > > >>>>> -- > > > > >>>>> "So you have to trust that the dots will somehow connect in your > future." > > >>>>> > > > > > >