Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

vino yang Wed, 10 Oct 2018 18:45:37 -0700

Hi Xuefu,

Appreciate this proposal, and like Fabian, it would look better if you can
give more details of the plan.


Thanks, vino.

Fabian Hueske <[email protected]> 于2018年10月10日周三 下午5:27写道：

> Hi Xuefu,
>
> Welcome to the Flink community and thanks for starting this discussion!
> Better Hive integration would be really great!
> Can you go into details of what you are proposing? I can think of a couple
> ways to improve Flink in that regard:
>
> * Support for Hive UDFs
> * Support for Hive metadata catalog
> * Support for HiveQL syntax
> * ???
>
> Best, Fabian
>
> Am Di., 9. Okt. 2018 um 19:22 Uhr schrieb Zhang, Xuefu <
> [email protected]>:
>
>> Hi all,
>>
>> Along with the community's effort, inside Alibaba we have explored
>> Flink's potential as an execution engine not just for stream processing but
>> also for batch processing. We are encouraged by our findings and have
>> initiated our effort to make Flink's SQL capabilities full-fledged. When
>> comparing what's available in Flink to the offerings from competitive data
>> processing engines, we identified a major gap in Flink: a well integration
>> with Hive ecosystem. This is crucial to the success of Flink SQL and batch
>> due to the well-established data ecosystem around Hive. Therefore, we have
>> done some initial work along this direction but there are still a lot of
>> effort needed.
>>
>> We have two strategies in mind. The first one is to make Flink SQL
>> full-fledged and well-integrated with Hive ecosystem. This is a similar
>> approach to what Spark SQL adopted. The second strategy is to make Hive
>> itself work with Flink, similar to the proposal in [1]. Each approach bears
>> its pros and cons, but they don’t need to be mutually exclusive with each
>> targeting at different users and use cases. We believe that both will
>> promote a much greater adoption of Flink beyond stream processing.
>>
>> We have been focused on the first approach and would like to showcase
>> Flink's batch and SQL capabilities with Flink SQL. However, we have also
>> planned to start strategy #2 as the follow-up effort.
>>
>> I'm completely new to Flink(, with a short bio [2] below), though many of
>> my colleagues here at Alibaba are long-time contributors. Nevertheless, I'd
>> like to share our thoughts and invite your early feedback. At the same
>> time, I am working on a detailed proposal on Flink SQL's integration with
>> Hive ecosystem, which will be also shared when ready.
>>
>> While the ideas are simple, each approach will demand significant effort,
>> more than what we can afford. Thus, the input and contributions from the
>> communities are greatly welcome and appreciated.
>>
>> Regards,
>>
>>
>> Xuefu
>>
>> References:
>>
>> [1] https://issues.apache.org/jira/browse/HIVE-10712
>> [2] Xuefu Zhang is a long-time open source veteran, worked or working on
>> many projects under Apache Foundation, of which he is also an honored
>> member. About 10 years ago he worked in the Hadoop team at Yahoo where the
>> projects just got started. Later he worked at Cloudera, initiating and
>> leading the development of Hive on Spark project in the communities and
>> across many organizations. Prior to joining Alibaba, he worked at Uber
>> where he promoted Hive on Spark to all Uber's SQL on Hadoop workload and
>> significantly improved Uber's cluster efficiency.
>>
>>
>>

Re: [DISCUSS] Integrate Flink SQL well with Hive ecosystem

Reply via email to