Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Shammon FY Fri, 02 Dec 2022 02:12:46 -0800

Hi zelin

Thanks for driving this discussion.


I notice that the sql-client will interact with sql-gateway by `REST
Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for
sql-gateway?

Then the sql-client can connect the gateway with jdbc-sdk, on the other
hand, the other applications and tools such as jmeter can use the jdbc-sdk
to connect sql-gateway too.

Best,
Shammon


On Fri, Dec 2, 2022 at 4:10 PM yu zelin <yuzelin....@gmail.com> wrote:

> Hi Jim,
>
> Thanks for your feedback!
>
> > Should this configuration be mentioned in the FLIP?
>
> Sure.
>
> > some way for the server to be able to limit the number of requests it
> receives.
> I’m sorry that this FLIP is dedicated in implementing the Remote mode, so
> we
> didn't consider much about this. I think the option is enough currently.
> I will add
> the improvement suggestions to the ‘Future Work’.
>
> > I wonder if two other options are possible
>
> To forward the raw format to gateway and then to client is possible. The
> raw
> results from sink is in ‘CollectResultIterator#bufferedResult’. First, we
> can find
> a way to get this result without wrapping it. Second, constructing a
> ‘InternalTypeInfo’.
> We can construct it using the schema information (data’s logical type).
> After
> construction, we can get the ’TypeSerializer’ to deserialize the raw
> result.
>
>
>
>
> > 2022年12月1日 04:54，Jim Hughes <jhug...@confluent.io.INVALID> 写道：
> >
> > Hi Yu,
> >
> > Thanks for moving my comments to this thread!  Also, thank you for
> > answering my questions; it is helping me understand the SQL Gateway
> > better.
> >
> > 5.
> >> Our idea is to introduce a new session option (like
> > 'sql-client.result.fetch-interval') to control
> > the fetching requests sending frequency. What do you think?
> >
> > Should this configuration be mentioned in the FLIP?
> >
> > One slight concern I have with having 'sql-client.result.fetch-interval'
> as
> > a session configuration is that users could set it low and cause the
> client
> > to send a large volume of requests to the SQL gateway.
> >
> > Generally, I'd like to see some way for the server to be able to limit
> the
> > number of requests it receives.  If that really needs to be done by a
> proxy
> > in front of the SQL gateway, that is fine as well.  (To be clear, I don't
> > think my concern here should be blocking in any way.)
> >
> > 7.
> >> What is the serialization lifecycle for results?
> >
> > I wonder if two other options are possible:
> > 3) Could the Gateway just forward the result byte array?  (Or does the
> > Gateway need to deserialize the response in order to understand it for
> some
> > reason?)
> > 4) Could the JobManager prepare the results in JSON?  (Or similarly could
> > the Client read the format which the JobManager sends?)
> >
> > Thanks again!
> >
> > Cheers,
> >
> > Jim
> >
> > On Wed, Nov 30, 2022 at 9:40 AM yu zelin <yuzelin....@gmail.com> wrote:
> >
> >> Hi, all
> >>
> >> Thanks Jim’s questions below. Here I’d like to reply to them.
> >>
> >>>  1. For the Client Parser, is it going to work with the extended syntax
> >>>  from the Flink Table Store?
> >>>
> >>>  2. Relatedly, what will happen if an older Client tries to handle
> >> syntax
> >>>  that a newer service supports?  (Suppose I use a 1.17 client with a
> >> 1.18
> >>>  Gateway/system which has a new keyword.  Is there anything we should
> be
> >>>  designing for upfront?)
> >>>
> >>>  3. How will client and server version mismatches be handled?  Will a
> >>>  single gateway be able to support multiple endpoint versions?
> >>>  4. How are commands which change a session handled?  Are those sent
> via
> >>>  an ExecuteStatementRequest?
> >>>
> >>>  5. The remote POC uses polling for getting back status and getting
> back
> >>>  results.  Would it be possible to switch to web sockets or some other
> >>>  mechanism to avoid polling?  If polling is used for both, the polling
> >>>  frequency should be different between local and remote configurations.
> >>>
> >>>  6. What does this sentence mean?  "The reason why we didn't get the
> sql
> >>>  type in client side is because it's hard for the lightweight
> >> client-level
> >>>  parser to recognize some sql type  sql, such as query with CTE.  "
> >>>
> >>>  7. What is the serialization lifecycle for results?  It makes sense to
> >>>  have some control over whether the gateway returns results as SQL or
> >> JSON.
> >>>  I'd love to see a way to avoid needing to serialize and deserialize
> >> results
> >>>  on the SQL Gateway if possible.  I'm still new enough to the project
> >> that
> >>>  I'm not sure if that's readily possible.  Maybe the SQL Gateway's
> >> return
> >>>  type can be sent as part of the request so that the JobManager can
> send
> >>>  back results in an advantageous format?
> >>>
> >>>  8. Does ErrorType need to be marked as @PublicEvolving?
> >>>
> >>> I'm excited for the SQL client to support gateway mode!  Given the
> change
> >>> in design, do you think it'll still be part of the Flink 1.17 release?
> >>
> >> 1.  ClientParser can work with new (and unknown) SQL syntax. It is
> because
> >> if the
> >> sql type is not recognized, the sql will be submitted to the gateway
> >> directly.
> >>
> >> For more information: Actually, the proposed ClientParser only do two
> >> things:
> >> (1) Tell client commands (help, clear, etc) and sqls apart.
> >> (2) parses several sql types (e.g. SHOW CREATE statement, we can print
> raw
> >> string
> >> for the SHOW CREATE result instead of table). Here the recognization of
> >> sql types
> >> mostly affects the print style, and unrecognized sql also can be
> submitted
> >> to cluster.
> >> So the Client with new ClientParser can work compatible with new syntax.
> >>
> >> 2. First, I'd like to explain that the gateway APIs and supported syntax
> >> is two things.
> >> For example, ‘configureSession' and 'completeStatement' are APIs. As
> >> mentioned
> >> in #1, the sql statements which syntax is unknown will be submitted to
> the
> >> gateway,
> >> and whether they can be executed normally depends on whether the
> execution
> >> environment supports the syntax.
> >>
> >>> Is there anything we should be designing for upfront?
> >>
> >> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql
> >> gateway APIs.
> >>
> >> 3.
> >>> How will client and server version mismatches be handled?
> >>
> >> A lower version client can work compatible with a higher version gateway
> >> because the
> >> old interfaces won’t be deleted. When a higher version client connects
> to
> >> a lower version
> >> gateway, the client should notify the users if they try to use
> unsupported
> >> features. For
> >> example, the client start option ‘-i’  means using initialization file
> to
> >> initialize the session.
> >> We plan to use the gateway’s ‘configureSession’ to implement it. But
> this
> >> API is not
> >> implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the
> >> user try to
> >> use ‘-i’ option to start the client with the 1.16 gateway, the client
> >> should tell the user that
> >> Can’t execute ‘-i’ option with gateway which version is lower than V2.
> >>
> >>> Will a single gateway be able to support multiple endpoint versions?
> >>
> >> Currently, the gateway only starts a highest version endpoint and the
> >> higher version endpoint
> >> is compatible with the lower version endpoint’s protocol.
> >>
> >> 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the
> session
> >> configuration.
> >> Notice: the client can’t change the session (I mean, close current
> session
> >> and open another
> >> one). I’m not sure if you have need to change the session itself?
> >>
> >> 5.
> >>> Would it be possible to switch to web sockets or some other mechanism
> >> to avoid polling?
> >>
> >> Your suggestion is very good, but this flip is for supporting the remote
> >> client. How about taking
> >> it as a future work?
> >>
> >>> If polling is used for both, the polling frequency should be different
> >> between local and remote
> >> configurations.
> >>
> >> Our idea is to introduce a new session option (like
> >> 'sql-client.result.fetch-interval') to control
> >> the fetching requests sending frequency. What do you think?
> >>
> >> For more information: we are inclined to keep the polling behavior in
> this
> >> version. For streaming
> >> query, fetching results synchronously may occupy resources of the
> gateway
> >> in a long period.
> >> For example, if the job doesn’t return results for a long time because
> the
> >> window has not been
> >> triggered, the synchronously fetching will keep occupying the
> connection.
> >> In asynchronous
> >> situation, the gateway can return a NOT_READY_RESULT quickly and release
> >> the resources
> >> for other clients to use. I think we can make some improvements for the
> >> whole flow path in the
> >> future.
> >>
> >> 6. Sorry for that there is mistakes in this sentence. Let me make it
> clear.
> >>
> >> We proposed to add 'ContentType' to indicates the result is for what
> kind
> >> of sql. In this sentence,
> >> I want to explain why we add 'ContentType' since the ClientParser can
> >> recognize the sql type too.
> >> It is because the proposed ClientParser can't recognize complex syntax.
> >> For example, it can’t
> >> recognize query with CTE. So the result should carry content type
> >> information to help the client to
> >> know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates
> >> the result is for a
> >> query statement.
> >>
> >> 7.
> >>> What is the serialization lifecycle for results?
> >>
> >> 1) Sink to JobManager        : RowData -> Byte[ ] (serialize)
> >> 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize)
> >> 3) Gateway sending            : RowData -> Byte[ ] (serialized to JSON
> >> format)
> >> 4) Client receiving               : Byte[ ] -> RowData (deserialize)
> >>
> >>> Maybe the SQL Gateway's return type can be sent as part of the request
> >> so that the
> >> JobManager can send  back results in an advantageous format?
> >>
> >> Yes. I think it's an improvement for the Client and Gateway. We have
> some
> >> ideas. For example,
> >>
> >> 1) We can move the Gateway into the JobManager and reduce the Ser/De
> costs
> >> from JM to Gateway.
> >> 2) Or the Gateway can collect the data from the sink function directly
> >> instead of JobManager.
> >>
> >> But I think we can leave this as a future work and discuss in another
> >> thread.
> >>
> >> 8. Yes.
> >>
> >>> Do you think it'll still be part of the Flink 1.17 release?
> >> Yes. We will try our best to finish the work.
> >>
> >> Feel free to talk to me if I’m wrong or you have any other questions.
> >>
> >>
> >>> 2022年11月25日 11:48，yu zelin <yuzelin....@gmail.com> 写道：
> >>>
> >>> Hi, all
> >>>
> >>> I want to initiate a discussion on the FLIP-275: Support Remote SQL
> >> Client Based on SQL Gateway[1].
> >>> The motivation of this FLIP is that the current SQL Client allows only
> >> local connection which can not satisfy
> >>> the common need of connecting to a remote cluster.
> >>>
> >>> Since the FLIP-91[2] has introduced SQL Gateway, we proposed to
> >> implement the Remote SQL Client
> >>> based on SQL Gateway. In our design, we proposed two main changes:
> >>>
> >>> 1. New remote mode client which performs connection to the remote
> >> gateway through REST API.
> >>> 2. Migration of the current local mode client. We proposed to refactor
> >> the local client based on SQL Gateway
> >>>   to unify the interface for two modes.
> >>>
> >>> Looking forward to your suggestions.
> >>>
> >>> Best,
> >>> Yu Zelin
> >>>
> >>> [1] https://cwiki.apache.org/confluence/x/T48ODg
> >>> [2] https://cwiki.apache.org/confluence/x/rIyMC
> >>
> >>
>
>

Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Reply via email to