Re: FLIP-216 Decouple Hive connector with Flink planner

Francesco Guardiani Tue, 29 Mar 2022 03:35:21 -0700

As there was already a discussion in the doc, I'll just summarize my
opinions here on the proposed execution of this FLIP.

I think we should rather avoid exposing internal details, which I consider
Calcite to be part of, but rather reuse what we already have to define an
AST from Table API, which is what I'll refer in this mail as Operation tree.

First of all, the reason I think this FLIP is not a good idea is that it
proposes is to expose types out of our control, so an API we cannot control
and we may realistically never be able to stabilize. A Calcite bump in the
table project is already pretty hard today, as shown by tasks like that
https://github.com/apache/flink/pull/13577. This will make them even
harder. Essentially it will couple us to Calcite even more, and create a
different but still big maintenance/complexity burden we would like to get
rid of with this FLIP.

There are also some technical aspects that seem to me a bit overlooked here:

* What about Scala? Is flink-table-planner-spi going to be a scala module
with the related suffix? Because I see you want to expose a couple of types
which we have implemented with Scala right now, and making this module
Scala dependent makes even more complicated shipping both modules that use
it and flink-table-planner-loader.
* Are you sure exposing the Calcite interfaces is going to be enough? Don't
you also require some instance specific methods? E.g.
FlinkTypeFactory#toLogicalType? What if at some point you need to expose
something like FlinkTypeFactory? How do you plan to support it and
stabilize it in the long term?

Now let me talk a bit about the Operation tree. For who doesn't know what
it is, it's the pure Flink AST for defining DML, used for converting the
Table API DSL to an AST the planner can manipulate. Essentially, it's our
own version of the RelNode tree/RexNode tree. This operation tree can be
used already by Hive, without any API changes on Table side. You just need
a downcast of TableEnvironmentImpl to use getPlanner() and use
Planner#translate, or alternatively you can add getPlanner to
TableEnvironmentInternal directly. From what I've seen about your use case,
and please correct me if I'm wrong, you can implement your SQL -> Operation
tree layer without substantial changes on both sides.

The reason why I think this is a better idea, rather than exposing Calcite
and RelNodes directly, is:

* Aforementioned downsides of exposing Calcite
* It doesn't require a new API to get you started with it
* Doesn't add complexity on planner side, just removes it from the existing
coupling with hive
* Letting another project use the Operation tree will harden it, make it
more stable and eventually lead to become public

The last point in particular is extremely interesting for the future of the
project, as having a stable public Operation tree will allow people to
implement other relational based APIs on top of Flink SQL, or manipulate
the AST to define new semantics, or even more crazy things we can't think
of right now, leading to a broader bigger and more diverse ecosystem. Which
is exactly what Hive is doing right now at the end of the day, define a new
relational API on top of the Flink Table planner functionalities.

On Mon, Mar 28, 2022 at 10:57 AM 罗宇侠(莫辞)
<luoyuxia.luoyu...@alibaba-inc.com.invalid> wrote:

> Sorry for this email, seems there's some format issue in my email client.
> Just ignore it for it's a duplicate of  [DISCUSS] FLIP-216 Decouple Hive
> connector with Flink planner [1]
>
>
> [1] https://lists.apache.org/thread/6xg33nxrnow5zy7xwqk5nwp00h9gcsbc
>
> Best regards,
> Yuxia------------------------------------------------------------------
> 发件人：罗宇侠<yuxia...@qq.com.INVALID>
> 日 期：2022年03月25日 20:19:37
> 收件人：dev<dev@flink.apache.org>
> 主 题：FLIP-216 Decouple Hive connector with Flink planner
>
> Hi,&nbsp;everyone
>
>
>
>
> I&nbsp;would&nbsp;like&nbsp;to&nbsp;open&nbsp;a&nbsp;discussion&nbsp;about&nbsp;decoupling&nbsp;Hive&nbsp;connector&nbsp;with&nbsp;Flink&nbsp;table&nbsp;planner.
> &nbsp;It's&nbsp;a&nbsp;follow-up&nbsp;discussion&nbsp;after&nbsp;Hive&nbsp;syntax&nbsp;discussion[1],&nbsp;but&nbsp;only&nbsp;focus&nbsp;on&nbsp;how&nbsp;to&nbsp;decouple&nbsp;Hive&nbsp;connector.&nbsp;The&nbsp;origin&nbsp;doc&nbsp;is&nbsp;here[2],&nbsp;from&nbsp;which&nbsp;you&nbsp;can&nbsp;see&nbsp;the&nbsp;details&nbsp;work&nbsp;and&nbsp;heated&nbsp;discussion&nbsp;about&nbsp;exposing&nbsp;Calcite&nbsp;API&nbsp;or&nbsp;reuse&nbsp;Operation&nbsp;tree&nbsp;to&nbsp;decouple.&nbsp;
>
> I&nbsp;have&nbsp;created&nbsp;FLIP-216:&nbsp;Decouple&nbsp;Hive&nbsp;connector&nbsp;with&nbsp;Flink&nbsp;planner[3]&nbsp;for&nbsp;it.
>
>
>
>
>
> Thanks&nbsp;and&nbsp;looking&nbsp;forward&nbsp;to&nbsp;a&nbsp;lively&nbsp;discussion!
>
>
>
> [1]&nbsp;https://lists.apache.org/thread/2w046dwl46tf2wy750gzmt0qrcz17z8t
>
> [2]&nbsp;
> https://docs.google.com/document/d/1LMQ_mWfB_mkYkEBCUa2DgCO2YdtiZV7YRs2mpXyjdP4/edit?usp=sharing
>
> [3]&nbsp;
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-216%3A+Decouple+Hive+connector+with+Flink+planner
>
>
>
>
> Best&nbsp;regards,
> Yuxia
>
>
> &nbsp;
>

Re: FLIP-216 Decouple Hive connector with Flink planner

Reply via email to