Hi Shengkai, One comment on the configure_session VS initialize_session, I think configure_session is better than initialize_session, because "initialize" sounds like this method can only be invoked once. But I can see this method is useful to be invoked several times to configure the session during the lifecycle of a session. If calling it initialize_session, should we limit it only be called once?
Best, Jark On Thu, 12 May 2022 at 22:38, Jark Wu <imj...@gmail.com> wrote: > Hi Timo, > > I agree with Shengkai that we should keep Gateway in the Flink main repo. > Here are my thoughts: > > 1) It is not feasible and maintainable to invoke different versions of > Flink dependencies > in one JVM. I'm afraid it would be a nightmare when the code is messed up > with all the reflections. > > 2) Actually, it's very mature to support submitting multi-version by > deploying one > gateway process for one version. Many systems are following this > architecture as Shengkai > mentioned above, including the Ververica Platform. And I don't see any > problem with it. > > 3) As Jingsong said, SQL CLI should build on top of Gateway to share code > and have the ability to > connect to Gateway. If SQL CLI is still kept in the Flink repo, we have > to put Gateway there as well. > > 4) Besides, Gateway is indispensable for a SQL engine (think of > Trino/Presto, Spark, Hive). > Otherwise, Flink will always be a processing system. With Gateway inside > the Flink repo, > we can provide an out-of-box experience as a SQL query engine. Users can > try out the gateway > for the latest version when a new version is released. > > 5) Last, Gateway inside the Flink repo can ensure the highest degree of > version compatibility. > > Best, > Jark > > > On Thu, 12 May 2022 at 19:19, Shengkai Fang <fskm...@gmail.com> wrote: > >> Hi Timo. >> >> Thanks for your feedback! >> >> > It would be great if you can copy/paste some of the tricky design >> decisions into the FLIP. >> >> Yes. I have already added a section about the `Rejected Alternatives`. It >> includes the discuss about the >> - TableInfo and FunctionInfo VS CatalogTable and CatalogFunction >> - Support the multi-version Flink in the Gateway VS Support the >> multi-version in the external Service >> - Merge Gateway into the Flink code base VS Support Gateway in the another >> repo >> ... >> >> >> > Separate code base >> >> Let me summarize all the discussion about the separate code base. Please >> correct me if anything is wrong. >> >> >> About support for the Gateway in the Flink code base. >> >> >> 1. It reduces the cost including tests, release and document. We don't >> need >> to upgrade the Gateway when Flink releases. >> 2. The SQL Client has the ability to submit the SQL to the Gateway. If we >> move the Gateway to the outside repo, the SQL Client doesn't have the >> related dependencies. >> 3. It is benefit for us to reuse code in the same repo. >> >> About supporting the Gateway in another repo. >> >> 1. It is easier to support the multi-version Flink; >> >> >> First of all, I think we should support the multiple Flink version with >> another component that uses the Gateway from the individual Flink >> releases. >> Zeppelin, Livy, and Kyuubi all use similar architecture. >> >> The current Gateway in the FLIP is bound to the specific Flink version, >> which needs to compile the SQL and submit the job to the cluster with the >> specified Flink version. >> We can introduce a service in the future(Maybe we can name it >> MultipleVersionFlinkService? ). It may tell the client where the Gateway >> with specified version is and the client just communicate with the Gateway >> later. It may also just parse the request, convert the request to the >> REST >> API in the FLIP and send to the Gateway with specified version. >> >> Therefore, I think we should merge the SQL Gateway inside the Flink repo. >> The MultipleVersionFlinkService may be inside or outside the repo. >> >> I add more details in the `Rejected Alternatives` in the FLIP. >> >> > Missing REST API spec >> >> Good point! I add a section in the `Appendix` about the serialization of >> the ResultSet. It includes how to serialize the schema, RowData and >> exceptions. >> >> > Consistency of concepts >> >> configure_session VS initialize_session >> >> Okay. I think it's better to use the initialize_session. >> >> TableInfo + UserDefinedFunctionInfo VS CatalogTable and CatalogFunction >> >> The CatalogTable and CatalogFunction are much heavier than the TableInfo >> and UserDefinedFunction. The CatalogManager requires reading from the >> Catalog to get the schema. But in the listTables only care about the table >> name, which is much lighter. Therefore, we propose to use the TableInfo >> with required fields. >> >> > Result Retrieval >> >> Yes. Currently the API is only used for the preview. I added a section in >> the `Rejected Alternatives` about this. >> >> Best, >> Shengkai >> >