Hi Xuefu, Appreciate this proposal, and like Fabian, it would look better if you can give more details of the plan.
Thanks, vino. Fabian Hueske <fhue...@gmail.com> 于2018年10月10日周三 下午5:27写道: > Hi Xuefu, > > Welcome to the Flink community and thanks for starting this discussion! > Better Hive integration would be really great! > Can you go into details of what you are proposing? I can think of a couple > ways to improve Flink in that regard: > > * Support for Hive UDFs > * Support for Hive metadata catalog > * Support for HiveQL syntax > * ??? > > Best, Fabian > > Am Di., 9. Okt. 2018 um 19:22 Uhr schrieb Zhang, Xuefu < > xuef...@alibaba-inc.com>: > >> Hi all, >> >> Along with the community's effort, inside Alibaba we have explored >> Flink's potential as an execution engine not just for stream processing but >> also for batch processing. We are encouraged by our findings and have >> initiated our effort to make Flink's SQL capabilities full-fledged. When >> comparing what's available in Flink to the offerings from competitive data >> processing engines, we identified a major gap in Flink: a well integration >> with Hive ecosystem. This is crucial to the success of Flink SQL and batch >> due to the well-established data ecosystem around Hive. Therefore, we have >> done some initial work along this direction but there are still a lot of >> effort needed. >> >> We have two strategies in mind. The first one is to make Flink SQL >> full-fledged and well-integrated with Hive ecosystem. This is a similar >> approach to what Spark SQL adopted. The second strategy is to make Hive >> itself work with Flink, similar to the proposal in [1]. Each approach bears >> its pros and cons, but they don’t need to be mutually exclusive with each >> targeting at different users and use cases. We believe that both will >> promote a much greater adoption of Flink beyond stream processing. >> >> We have been focused on the first approach and would like to showcase >> Flink's batch and SQL capabilities with Flink SQL. However, we have also >> planned to start strategy #2 as the follow-up effort. >> >> I'm completely new to Flink(, with a short bio [2] below), though many of >> my colleagues here at Alibaba are long-time contributors. Nevertheless, I'd >> like to share our thoughts and invite your early feedback. At the same >> time, I am working on a detailed proposal on Flink SQL's integration with >> Hive ecosystem, which will be also shared when ready. >> >> While the ideas are simple, each approach will demand significant effort, >> more than what we can afford. Thus, the input and contributions from the >> communities are greatly welcome and appreciated. >> >> Regards, >> >> >> Xuefu >> >> References: >> >> [1] https://issues.apache.org/jira/browse/HIVE-10712 >> [2] Xuefu Zhang is a long-time open source veteran, worked or working on >> many projects under Apache Foundation, of which he is also an honored >> member. About 10 years ago he worked in the Hadoop team at Yahoo where the >> projects just got started. Later he worked at Cloudera, initiating and >> leading the development of Hive on Spark project in the communities and >> across many organizations. Prior to joining Alibaba, he worked at Uber >> where he promoted Hive on Spark to all Uber's SQL on Hadoop workload and >> significantly improved Uber's cluster efficiency. >> >> >>