Re: [DISCUSS] FLIP-91: Support SQL Client Gateway

Shengkai Fang Thu, 28 Apr 2022 19:26:03 -0700

Hi Marijn and LuNing.

Thanks for your feedback!


> The FLIP is called "SQL Client Gateway", but isn't this a REST Gateway
which would be used by Flink's SQL Client (or other applications)?

Agreed. The FLIP mainly focus on the Gateway. I think it's better to rename
the name to the "Support SQL Gateway". WDYT?

> From a user perspective, I would have expected that we start with the
REST endpoint before explaining how we would integrate this into Flink. Now
it's quite hard to first understand what we want to offer to users and if
that will be sufficient for a first version.

emmm. Considering that api is basically the operation of some concepts, is
it better to introduce the core concepts first? But I agree you are right
that we should start with the RESt endpoint. I reorganize the content to
introduce the REST first in the public interfaces.

> With Flink 1.15, we're introducing an OpenAPI specification. Can we
also do this straight away for the REST Gateway?

Yes. We will organize the related APIs into OpenAPI specification.

>Should we introduce the REST Gateway as part of Flink's main repository?
>Wouldn't we be better off to maintain this in a separate repository under
>ASF?

I think it's better to intergate the Gateway into the Flink code base. The
reason behind is

1. The Gateway relies on the Flink implementation,  I think we'd better to
maintain it inside the Flink. It really takes us much time to upgrade the
sql-gateway in ververica repo to the latest Flink version.

2. The Gateway is important to the Flink itself. Many users needs the
Gateway to manage the Flink SQL jobs. Actually Hive, Spark both have its
Gateway in its code base.

But I think it's fine to put other utils, e.g. JDBC under the ASF.

> Ideally you would like to be able to support multiple Flink versions
> with one version of the REST Gateway I think?

> Users can upgrade a large number of Flink jobs versions gradually in a
Gateway service.

Because the Gateway itself relies on the Flink inner implementation...I
think we can just use one Gateway per versions. Users can manage the
gateway with other utils.

>There's no mention of Batch or Streaming in this concept. If I recall
>correctly, the current Flink SQL Gateway can only support Batch. How will
>we support Streaming?

> I can imagine that if a user wants to use a REST Gateway, there's also a
> strong need to combine this with a Catalog.

Yes. I add a section about the Usage of the Gateway. Users can use the SQL
do everything in the Gateway, including
- configure the execution parameter, including exectuion mode
- manage the metadata with DDL, e.g. register catalog
- submit the job
...

>Will there be any requirement with JDBC, as there currently is?

In the FLIP-223, we implement the HiveServer2 endpint. Users can use the
hive jdbc to connect to the Flink SQL Gateway.

> Shall we name this option `sql-gateway.session.init-file` and write it
into
the FLIP-91?

Actually we already supports the -i parameters in the sql client. What's
more, Hive also supports the -i parameter in the client side[1].
I think it's fine to move this functionlity to the client rather than
gateway. WDYT?

[1]
https://github.com/apache/hive/blob/c3fa88a1b7d1475f44383fca913aecf9c664bab0/jdbc/src/java/org/apache/hive/jdbc/HiveConnection.java#L321

Best,
Shengkai





LuNing Wang <[email protected]> 于2022年4月28日周四 10:04写道：

> > * Should we introduce the REST Gateway as part of Flink's main
> repository?
> Wouldn't we be better off to maintain this in a separate repository under
> ASF? Ideally you would like to be able to support multiple Flink versions
> with one version of the REST Gateway I think?
>
> We would be better off maintaining this in a separate repository. It is
> important to support multiple Flink versions. Users can upgrade a large
> number of Flink jobs versions gradually in a Gateway service.
>
> LuNing Wang <[email protected]> 于2022年4月27日周三 17:54写道：
>
> > Hi ShengKai,
> >
> > After I read FLIP-91[1], I want to add an init-file option. Its
> > functionality is the same as option '-i' of Flink SQL Client.
> >
> > When I use Catalog(HiveCatalog), I need to execute `CREATE CATALOG` by
> > this option after SQL Gateway starts every time.
> >
> > Shall we name this option `sql-gateway.session.init-file` and write it
> > into the FLIP-91?
> >
> > Best regards,
> >
> > LuNing Wang
> >
> > [1]https://cwiki.apache.org/confluence/display/FLINK/FLIP-91
> >
> > Martijn Visser <[email protected]> 于2022年4月26日周二 20:32写道：
> >
> >> Hi Shengkai,
> >>
> >> Thanks for opening this discussion. I did a first brief pass over the
> FLIP
> >> and I have a couple of questions/remarks:
> >>
> >> * The FLIP is called "SQL Client Gateway", but isn't this a REST Gateway
> >> which would be used by Flink's SQL Client (or other applications)?
> >>
> >> * From a user perspective, I would have expected that we start with the
> >> REST endpoint before explaining how we would integrate this into Flink.
> >> Now
> >> it's quite hard to first understand what we want to offer to users and
> if
> >> that will be sufficient for a first version.
> >>
> >> * With Flink 1.15, we're introducing an OpenAPI specification [1]. Can
> we
> >> also do this straight away for the REST Gateway?
> >>
> >> * Should we introduce the REST Gateway as part of Flink's main
> repository?
> >> Wouldn't we be better off to maintain this in a separate repository
> under
> >> ASF? Ideally you would like to be able to support multiple Flink
> versions
> >> with one version of the REST Gateway I think?
> >>
> >> * There's no mention of Batch or Streaming in this concept. If I recall
> >> correctly, the current Flink SQL Gateway can only support Batch. How
> will
> >> we support Streaming? Will there be any requirement with JDBC, as there
> >> currently is?
> >>
> >> * I can imagine that if a user wants to use a REST Gateway, there's
> also a
> >> strong need to combine this with a Catalog. Do you think this should be
> >> part of this FLIP?
> >>
> >> Best regards,
> >>
> >> Martijn Visser
> >> https://twitter.com/MartijnVisser82
> >> https://github.com/MartijnVisser
> >>
> >> [1]
> >>
> >>
> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobmanager
> >>
> >> On Sun, 24 Apr 2022 at 05:29, Shengkai Fang <[email protected]> wrote:
> >>
> >> > Hi. Jiang.
> >> >
> >> > Thanks for your feedback！
> >> >
> >> > > Do the public interfaces of GatewayService refer to any service?
> >> >
> >> > We will only expose one GatewayService implementation. We will put the
> >> > interface into the common package and the developer who wants to
> >> implement
> >> > a new endpoint can just rely on the interface package rather than the
> >> > implementation.
> >> >
> >> > > What's the behavior of SQL Client Gateway working on Yarn or K8S?
> Does
> >> > the SQL Client Gateway support application or session mode on Yarn?
> >> >
> >> > I think we can support SQL Client Gateway to submit the jobs in
> >> > application/sesison mode.
> >> >
> >> > > Is there any event trigger in the operation state machine?
> >> >
> >> > Yes. I have already updated the content and add more details about the
> >> > state machine. During the revise, I found that I mix up the two
> >> concepts:
> >> > job submission and job execution. In fact, we only control the
> >> submission
> >> > mode at the gateway layer. Therefore, we don't need to mapping the
> >> > JobStatus here. If the user expects that the synchronization behavior
> >> is to
> >> > wait for the completion of the job execution before allowing the next
> >> > statement to be executed, then the Operation lifecycle should also
> >> contains
> >> > the job's execution, which means users should set `table.dml-sync`.
> >> >
> >> > > What's the return schema for the public interfaces of
> GatewayService?
> >> > Like getTable interface, what's the return value schema?
> >> >
> >> > The API of the GatewayService return the java objects and the endpoint
> >> can
> >> > organize the objects with expected schema. The return results is also
> >> list
> >> > the section ComponetAPI#GatewayService#API. The return type of the
> >> > GatewayService#getTable is `ContextResolvedTable`.
> >> >
> >> > > How does the user get the operation log?
> >> >
> >> > The OperationManager will register the LogAppender before the
> Operation
> >> > execution. The Log Appender will hijack the logger and also write the
> >> log
> >> > that related to the Operation to another files. When users wants to
> >> fetch
> >> > the Operation log, the GatewayService will read the content in the
> file
> >> and
> >> > return.
> >> >
> >> > Best,
> >> > Shengkai
> >> >
> >> >
> >> >
> >> >
> >> > Nicholas Jiang <[email protected]> 于2022年4月22日周五 16:21写道：
> >> >
> >> > > Hi Shengkai.
> >> > >
> >> > > Thanks for driving the proposal of SQL Client Gateway. I have some
> >> > > knowledge of Kyuubi and have some questions about the design:
> >> > >
> >> > > 1.Do the public interfaces of GatewayService refer to any service?
> If
> >> > > referring to HiveService, does GatewayService need interfaces like
> >> > > getQueryId etc.
> >> > >
> >> > > 2.What's the behavior of SQL Client Gateway working on Yarn or K8S?
> >> Does
> >> > > the SQL Client Gateway support application or session mode on Yarn?
> >> > >
> >> > > 3.Is there any event trigger in the operation state machine?
> >> > >
> >> > > 4.What's the return schema for the public interfaces of
> >> GatewayService?
> >> > > Like getTable interface, what's the return value schema?
> >> > >
> >> > > 5.How does the user get the operation log?
> >> > >
> >> > > Thanks,
> >> > > Nicholas Jiang
> >> > >
> >> > > On 2022/04/21 06:42:30 Shengkai Fang wrote:
> >> > > > Hi, Flink developers.
> >> > > >
> >> > > > I want to start a discussion about the FLIP-91: Support Flink SQL
> >> > > > Gateway[1]. Flink SQL Gateway is a service that allows users to
> >> submit
> >> > > and
> >> > > > manage their jobs in the online environment with the pluggable
> >> > endpoints.
> >> > > > The reason why we introduce the Gateway with pluggable endpoints
> is
> >> > that
> >> > > > many users have their preferences. For example, the HiveServer2
> >> users
> >> > > > prefer to use the gateway with HiveServer2-style API, which has
> >> > numerous
> >> > > > tools. However, some filnk-native users may prefer to use the REST
> >> API.
> >> > > > Therefore, we propose the SQL Gateway with pluggable endpoint.
> >> > > >
> >> > > > In the FLIP, we also propose the REST endpoint, which has the
> >> similar
> >> > > > APIs compared to the gateway in the
> ververica/flink-sql-gateway[2].
> >> At
> >> > > the
> >> > > > last, we discuss how to use the SQL Client to submit the statement
> >> to
> >> > the
> >> > > > Gateway with the REST API.
> >> > > >
> >> > > > I am glad that you can give some feedback about FLIP-91.
> >> > > >
> >> > > > Best,
> >> > > > Shengkai
> >> > > >
> >> > > > [1]
> >> > > >
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >> > > > [2] https://github.com/ververica/flink-sql-gateway
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] FLIP-91: Support SQL Client Gateway

Reply via email to