Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

yu zelin Fri, 02 Dec 2022 00:09:47 -0800

Hi Jim,

Thanks for your feedback!


> Should this configuration be mentioned in the FLIP?

Sure. 

> some way for the server to be able to limit the number of requests it 
> receives.
I’m sorry that this FLIP is dedicated in implementing the Remote mode, so we
didn't consider much about this. I think the option is enough currently.  I 
will add 
the improvement suggestions to the ‘Future Work’.

> I wonder if two other options are possible

To forward the raw format to gateway and then to client is possible. The raw 
results from sink is in ‘CollectResultIterator#bufferedResult’. First, we can 
find 
a way to get this result without wrapping it. Second, constructing a 
‘InternalTypeInfo’.
We can construct it using the schema information (data’s logical type). After 
construction, we can get the ’TypeSerializer’ to deserialize the raw result.




> 2022年12月1日 04:54，Jim Hughes <[email protected]> 写道：
> 
> Hi Yu,
> 
> Thanks for moving my comments to this thread!  Also, thank you for
> answering my questions; it is helping me understand the SQL Gateway
> better.
> 
> 5.
>> Our idea is to introduce a new session option (like
> 'sql-client.result.fetch-interval') to control
> the fetching requests sending frequency. What do you think?
> 
> Should this configuration be mentioned in the FLIP?
> 
> One slight concern I have with having 'sql-client.result.fetch-interval' as
> a session configuration is that users could set it low and cause the client
> to send a large volume of requests to the SQL gateway.
> 
> Generally, I'd like to see some way for the server to be able to limit the
> number of requests it receives.  If that really needs to be done by a proxy
> in front of the SQL gateway, that is fine as well.  (To be clear, I don't
> think my concern here should be blocking in any way.)
> 
> 7.
>> What is the serialization lifecycle for results?
> 
> I wonder if two other options are possible:
> 3) Could the Gateway just forward the result byte array?  (Or does the
> Gateway need to deserialize the response in order to understand it for some
> reason?)
> 4) Could the JobManager prepare the results in JSON?  (Or similarly could
> the Client read the format which the JobManager sends?)
> 
> Thanks again!
> 
> Cheers,
> 
> Jim
> 
> On Wed, Nov 30, 2022 at 9:40 AM yu zelin <[email protected]> wrote:
> 
>> Hi, all
>> 
>> Thanks Jim’s questions below. Here I’d like to reply to them.
>> 
>>>  1. For the Client Parser, is it going to work with the extended syntax
>>>  from the Flink Table Store?
>>> 
>>>  2. Relatedly, what will happen if an older Client tries to handle
>> syntax
>>>  that a newer service supports?  (Suppose I use a 1.17 client with a
>> 1.18
>>>  Gateway/system which has a new keyword.  Is there anything we should be
>>>  designing for upfront?)
>>> 
>>>  3. How will client and server version mismatches be handled?  Will a
>>>  single gateway be able to support multiple endpoint versions?
>>>  4. How are commands which change a session handled?  Are those sent via
>>>  an ExecuteStatementRequest?
>>> 
>>>  5. The remote POC uses polling for getting back status and getting back
>>>  results.  Would it be possible to switch to web sockets or some other
>>>  mechanism to avoid polling?  If polling is used for both, the polling
>>>  frequency should be different between local and remote configurations.
>>> 
>>>  6. What does this sentence mean?  "The reason why we didn't get the sql
>>>  type in client side is because it's hard for the lightweight
>> client-level
>>>  parser to recognize some sql type  sql, such as query with CTE.  "
>>> 
>>>  7. What is the serialization lifecycle for results?  It makes sense to
>>>  have some control over whether the gateway returns results as SQL or
>> JSON.
>>>  I'd love to see a way to avoid needing to serialize and deserialize
>> results
>>>  on the SQL Gateway if possible.  I'm still new enough to the project
>> that
>>>  I'm not sure if that's readily possible.  Maybe the SQL Gateway's
>> return
>>>  type can be sent as part of the request so that the JobManager can send
>>>  back results in an advantageous format?
>>> 
>>>  8. Does ErrorType need to be marked as @PublicEvolving?
>>> 
>>> I'm excited for the SQL client to support gateway mode!  Given the change
>>> in design, do you think it'll still be part of the Flink 1.17 release?
>> 
>> 1.  ClientParser can work with new (and unknown) SQL syntax. It is because
>> if the
>> sql type is not recognized, the sql will be submitted to the gateway
>> directly.
>> 
>> For more information: Actually, the proposed ClientParser only do two
>> things:
>> (1) Tell client commands (help, clear, etc) and sqls apart.
>> (2) parses several sql types (e.g. SHOW CREATE statement, we can print raw
>> string
>> for the SHOW CREATE result instead of table). Here the recognization of
>> sql types
>> mostly affects the print style, and unrecognized sql also can be submitted
>> to cluster.
>> So the Client with new ClientParser can work compatible with new syntax.
>> 
>> 2. First, I'd like to explain that the gateway APIs and supported syntax
>> is two things.
>> For example, ‘configureSession' and 'completeStatement' are APIs. As
>> mentioned
>> in #1, the sql statements which syntax is unknown will be submitted to the
>> gateway,
>> and whether they can be executed normally depends on whether the execution
>> environment supports the syntax.
>> 
>>> Is there anything we should be designing for upfront?
>> 
>> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql
>> gateway APIs.
>> 
>> 3.
>>> How will client and server version mismatches be handled?
>> 
>> A lower version client can work compatible with a higher version gateway
>> because the
>> old interfaces won’t be deleted. When a higher version client connects to
>> a lower version
>> gateway, the client should notify the users if they try to use unsupported
>> features. For
>> example, the client start option ‘-i’  means using initialization file to
>> initialize the session.
>> We plan to use the gateway’s ‘configureSession’ to implement it. But this
>> API is not
>> implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the
>> user try to
>> use ‘-i’ option to start the client with the 1.16 gateway, the client
>> should tell the user that
>> Can’t execute ‘-i’ option with gateway which version is lower than V2.
>> 
>>> Will a single gateway be able to support multiple endpoint versions?
>> 
>> Currently, the gateway only starts a highest version endpoint and the
>> higher version endpoint
>> is compatible with the lower version endpoint’s protocol.
>> 
>> 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the session
>> configuration.
>> Notice: the client can’t change the session (I mean, close current session
>> and open another
>> one). I’m not sure if you have need to change the session itself?
>> 
>> 5.
>>> Would it be possible to switch to web sockets or some other mechanism
>> to avoid polling?
>> 
>> Your suggestion is very good, but this flip is for supporting the remote
>> client. How about taking
>> it as a future work?
>> 
>>> If polling is used for both, the polling frequency should be different
>> between local and remote
>> configurations.
>> 
>> Our idea is to introduce a new session option (like
>> 'sql-client.result.fetch-interval') to control
>> the fetching requests sending frequency. What do you think?
>> 
>> For more information: we are inclined to keep the polling behavior in this
>> version. For streaming
>> query, fetching results synchronously may occupy resources of the gateway
>> in a long period.
>> For example, if the job doesn’t return results for a long time because the
>> window has not been
>> triggered, the synchronously fetching will keep occupying the connection.
>> In asynchronous
>> situation, the gateway can return a NOT_READY_RESULT quickly and release
>> the resources
>> for other clients to use. I think we can make some improvements for the
>> whole flow path in the
>> future.
>> 
>> 6. Sorry for that there is mistakes in this sentence. Let me make it clear.
>> 
>> We proposed to add 'ContentType' to indicates the result is for what kind
>> of sql. In this sentence,
>> I want to explain why we add 'ContentType' since the ClientParser can
>> recognize the sql type too.
>> It is because the proposed ClientParser can't recognize complex syntax.
>> For example, it can’t
>> recognize query with CTE. So the result should carry content type
>> information to help the client to
>> know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates
>> the result is for a
>> query statement.
>> 
>> 7.
>>> What is the serialization lifecycle for results?
>> 
>> 1) Sink to JobManager        : RowData -> Byte[ ] (serialize)
>> 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize)
>> 3) Gateway sending            : RowData -> Byte[ ] (serialized to JSON
>> format)
>> 4) Client receiving               : Byte[ ] -> RowData (deserialize)
>> 
>>> Maybe the SQL Gateway's return type can be sent as part of the request
>> so that the
>> JobManager can send  back results in an advantageous format?
>> 
>> Yes. I think it's an improvement for the Client and Gateway. We have some
>> ideas. For example,
>> 
>> 1) We can move the Gateway into the JobManager and reduce the Ser/De costs
>> from JM to Gateway.
>> 2) Or the Gateway can collect the data from the sink function directly
>> instead of JobManager.
>> 
>> But I think we can leave this as a future work and discuss in another
>> thread.
>> 
>> 8. Yes.
>> 
>>> Do you think it'll still be part of the Flink 1.17 release?
>> Yes. We will try our best to finish the work.
>> 
>> Feel free to talk to me if I’m wrong or you have any other questions.
>> 
>> 
>>> 2022年11月25日 11:48，yu zelin <[email protected]> 写道：
>>> 
>>> Hi, all
>>> 
>>> I want to initiate a discussion on the FLIP-275: Support Remote SQL
>> Client Based on SQL Gateway[1].
>>> The motivation of this FLIP is that the current SQL Client allows only
>> local connection which can not satisfy
>>> the common need of connecting to a remote cluster.
>>> 
>>> Since the FLIP-91[2] has introduced SQL Gateway, we proposed to
>> implement the Remote SQL Client
>>> based on SQL Gateway. In our design, we proposed two main changes:
>>> 
>>> 1. New remote mode client which performs connection to the remote
>> gateway through REST API.
>>> 2. Migration of the current local mode client. We proposed to refactor
>> the local client based on SQL Gateway
>>>   to unify the interface for two modes.
>>> 
>>> Looking forward to your suggestions.
>>> 
>>> Best,
>>> Yu Zelin
>>> 
>>> [1] https://cwiki.apache.org/confluence/x/T48ODg
>>> [2] https://cwiki.apache.org/confluence/x/rIyMC
>> 
>>

Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Reply via email to