Hi Shengkai,

One comment on the configure_session VS initialize_session,
I think configure_session is better than initialize_session,
because "initialize" sounds like this method can only be invoked once.
But I can see this method is useful to be invoked several times to
configure
the session during the lifecycle of a session. If calling it
initialize_session,
should we limit it only be called once?

Best,
Jark



On Thu, 12 May 2022 at 22:38, Jark Wu <imj...@gmail.com> wrote:

> Hi Timo,
>
> I agree with Shengkai that we should keep Gateway in the Flink main repo.
> Here are my thoughts:
>
> 1) It is not feasible and maintainable to invoke different versions of
> Flink dependencies
> in one JVM. I'm afraid it would be a nightmare when the code is messed up
> with all the reflections.
>
> 2) Actually, it's very mature to support submitting multi-version by
> deploying one
> gateway process for one version. Many systems are following this
> architecture as Shengkai
> mentioned above, including the Ververica Platform. And I don't see any
> problem with it.
>
> 3) As Jingsong said, SQL CLI should build on top of Gateway to share code
> and have the ability to
>  connect to Gateway. If SQL CLI is still kept in the Flink repo, we have
> to put Gateway there as well.
>
> 4) Besides, Gateway is indispensable for a SQL engine (think of
> Trino/Presto, Spark, Hive).
> Otherwise, Flink will always be a processing system. With Gateway inside
> the Flink repo,
> we can provide an out-of-box experience as a SQL query engine. Users can
> try out the gateway
> for the latest version when a new version is released.
>
> 5) Last, Gateway inside the Flink repo can ensure the highest degree of
> version compatibility.
>
> Best,
> Jark
>
>
> On Thu, 12 May 2022 at 19:19, Shengkai Fang <fskm...@gmail.com> wrote:
>
>> Hi Timo.
>>
>> Thanks for your feedback!
>>
>> > It would be great if you can copy/paste some of the tricky design
>> decisions into the FLIP.
>>
>> Yes. I have already added a section about the `Rejected Alternatives`. It
>> includes the discuss about the
>> - TableInfo and FunctionInfo VS CatalogTable and CatalogFunction
>> - Support the multi-version Flink in the Gateway VS Support the
>> multi-version in the external Service
>> - Merge Gateway into the Flink code base VS Support Gateway in the another
>> repo
>> ...
>>
>>
>> > Separate code base
>>
>> Let me summarize all the discussion about the separate code base. Please
>> correct me if anything is wrong.
>>
>>
>> About support for the Gateway in the Flink code base.
>>
>>
>> 1. It reduces the cost including tests, release and document. We don't
>> need
>> to upgrade the Gateway when Flink releases.
>> 2. The SQL Client has the ability to submit the SQL to the Gateway. If we
>> move the Gateway to the outside repo, the SQL Client doesn't have the
>> related dependencies.
>> 3. It is benefit for us to reuse code in the same repo.
>>
>> About supporting the Gateway in another repo.
>>
>> 1. It is easier to support the multi-version Flink;
>>
>>
>> First of all, I think we should support the multiple Flink version with
>> another component that uses the Gateway from the individual Flink
>> releases.
>> Zeppelin, Livy, and Kyuubi all use similar architecture.
>>
>> The current Gateway in the FLIP is bound to the specific Flink version,
>> which needs to compile the SQL and submit the job to the cluster with the
>> specified Flink version.
>> We can introduce a service in the future(Maybe we can name it
>> MultipleVersionFlinkService? ). It may tell the client where the Gateway
>> with specified version is and the client just communicate with the Gateway
>> later. It may also just parse the request,  convert the request to the
>> REST
>> API in the FLIP and send to the Gateway with specified version.
>>
>> Therefore, I think we should merge the SQL Gateway inside the Flink repo.
>> The MultipleVersionFlinkService may be inside or outside the repo.
>>
>> I add more details in the `Rejected Alternatives` in the FLIP.
>>
>> > Missing REST API spec
>>
>> Good point! I add a section in the `Appendix` about the serialization of
>> the ResultSet. It includes how to serialize the schema, RowData and
>> exceptions.
>>
>> > Consistency of concepts
>>
>> configure_session VS initialize_session
>>
>> Okay. I think it's better to use the initialize_session.
>>
>> TableInfo + UserDefinedFunctionInfo  VS  CatalogTable and CatalogFunction
>>
>> The CatalogTable and CatalogFunction are much heavier than the TableInfo
>> and UserDefinedFunction. The CatalogManager requires reading from the
>> Catalog to get the schema. But in the listTables only care about the table
>> name, which is much lighter. Therefore, we propose to use the TableInfo
>> with required fields.
>>
>> > Result Retrieval
>>
>> Yes. Currently the API is only used for the preview. I added a section in
>> the `Rejected Alternatives` about this.
>>
>> Best,
>> Shengkai
>>
>

Reply via email to