回复：[DISCUSS] Flink's supported APIs and Hive query syntax

罗宇侠(莫辞) Tue, 08 Mar 2022 01:39:15 -0800

Hi Martijn,
Thanks for driving this discussion. 

About your concerns, I would like to share my opinion.
Actually, more exactly, FLIP-152 [1] is not to extend Flink SQL to support Hive 
query synax, it provides a Hive dialect option to enable users to switch to 
Hive dialect. From the commits about the corresponding FLINK-21529, it doesn't 
involve much about Flink itself.


- About the struggling with maintaining. The current implementation is just to 
provide an option for user to use Hive dialect. I think there won't be much 
bother.

- Although Apache Hive is less popular, it's widely used as an open source 
database over the years. There still exists many Hive SQL jobs in many 
companies.

- As I said, the current implementation for Hive SQL synax is more like 
pluggable, we can also support for Snowflake and the others as long as it's 
necessary.

- As for the know security vulnerabilities of Hive, maybe it's not a critical 
problem in this discuss.

- For current implementation for Hive SQL syntax, it uses a pluggable 
HiveParser[3] to parse the SQL statement. I think there won't be much 
complexity brought to Flink to support Hive syntax. 

From my perspective, Hive is still widely used and there exists many running 
Hive SQL jobs, so why not to provide users a better experience to help them 
migrate Hive jobs to Flink? Also, it doesn't conflict with Flink SQL as it's 
just a dialect option. 

[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=165227316
[2] https://issues.apache.org/jira/browse/FLINK-21529
[3] https://issues.apache.org/jira/browse/FLINK-21531
Best regards,
Yuxia.



 ------------------原始邮件 ------------------
发件人:Martijn Visser <martijnvis...@apache.org>
发送时间:Mon Mar 7 19:23:15 2022
收件人:dev <dev@flink.apache.org>, User <u...@flink.apache.org>
主题:[DISCUSS] Flink's supported APIs and Hive query syntax
Hi everyone,

Flink currently has 4 APIs with multiple language support which can be used
to develop applications:

* DataStream API, both Java and Scala
* Table API, both Java and Scala
* Flink SQL, both in Flink query syntax and Hive query syntax (partially)
* Python API

Since FLIP-152 [1] the Flink SQL support has been extended to also support
the Hive query syntax. There is now a follow-up FLINK-26360 [2] to address
more syntax compatibility issues.

I would like to open a discussion on Flink directly supporting the Hive
query syntax. I have some concerns if having a 100% Hive query syntax is
indeed something that we should aim for in Flink.

I can understand that having Hive query syntax support in Flink could help
users due to interoperability and being able to migrate. However:

- Adding full Hive query syntax support will mean that we go from 6 fully
supported API/language combinations to 7. I think we are currently already
struggling with maintaining the existing combinations, let another one
more.
- Apache Hive is/appears to be a project that's not that actively developed
anymore. The last release was made in January 2021. It's popularity is
rapidly declining in Europe and the United State, also due Hadoop becoming
less popular.
- Related to the previous topic, other software like Snowflake,
Trino/Presto, Databricks are becoming more and more popular. If we add full
support for the Hive query syntax, then why not add support for Snowflake
and the others?
- We are supporting Hive versions that are no longer supported by the Hive
community with known security vulnerabilities. This makes Flink also
vulnerable for those type of vulnerabilities.
- The currently Hive implementation is done by using a lot of internals of
Flink, making Flink hard to maintain, with lots of tech debt and making
things overly complex.

From my perspective, I think it would be better to not have Hive query
syntax compatibility directly in Flink itself. Of course we should have a
proper Hive connector and a proper Hive catalog to make connectivity with
Hive (the versions that are still supported by the Hive community) itself
possible. Alternatively, if Hive query syntax is so important, it should
not rely on internals but be available as a dialect/pluggable option. That
could also open up the possibility to add more syntax support for others in
the future, but I really think we should just focus on Flink SQL itself.
That's already hard enough to maintain and improve on.

I'm looking forward to the thoughts of both Developers and Users, so I'm
cross-posting to both mailing lists.

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=165227316
[2] https://issues.apache.org/jira/browse/FLINK-21529

回复：[DISCUSS] Flink's supported APIs and Hive query syntax

Reply via email to