Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

yu zelin Sun, 04 Dec 2022 19:10:55 -0800

Hi, Shammon

Thanks for your feedback. I think it’s good to support jdbc-sdk. However, 
it's not supported in the gateway side yet. In my opinion, this FLIP is more
concerned with the SQL Client. How about put “supporting jdbc-sdk” in 
‘Future Work’? We can discuss how to implement it in another thread.


Best,
Yu Zelin
> 2022年12月2日 18:12，Shammon FY <[email protected]> 写道：
> 
> Hi zelin
> 
> Thanks for driving this discussion.
> 
> I notice that the sql-client will interact with sql-gateway by `REST
> Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for
> sql-gateway?
> 
> Then the sql-client can connect the gateway with jdbc-sdk, on the other
> hand, the other applications and tools such as jmeter can use the jdbc-sdk
> to connect sql-gateway too.
> 
> Best,
> Shammon
> 
> 
> On Fri, Dec 2, 2022 at 4:10 PM yu zelin <[email protected]> wrote:
> 
>> Hi Jim,
>> 
>> Thanks for your feedback!
>> 
>>> Should this configuration be mentioned in the FLIP?
>> 
>> Sure.
>> 
>>> some way for the server to be able to limit the number of requests it
>> receives.
>> I’m sorry that this FLIP is dedicated in implementing the Remote mode, so
>> we
>> didn't consider much about this. I think the option is enough currently.
>> I will add
>> the improvement suggestions to the ‘Future Work’.
>> 
>>> I wonder if two other options are possible
>> 
>> To forward the raw format to gateway and then to client is possible. The
>> raw
>> results from sink is in ‘CollectResultIterator#bufferedResult’. First, we
>> can find
>> a way to get this result without wrapping it. Second, constructing a
>> ‘InternalTypeInfo’.
>> We can construct it using the schema information (data’s logical type).
>> After
>> construction, we can get the ’TypeSerializer’ to deserialize the raw
>> result.
>> 
>> 
>> 
>> 
>>> 2022年12月1日 04:54，Jim Hughes <[email protected]> 写道：
>>> 
>>> Hi Yu,
>>> 
>>> Thanks for moving my comments to this thread!  Also, thank you for
>>> answering my questions; it is helping me understand the SQL Gateway
>>> better.
>>> 
>>> 5.
>>>> Our idea is to introduce a new session option (like
>>> 'sql-client.result.fetch-interval') to control
>>> the fetching requests sending frequency. What do you think?
>>> 
>>> Should this configuration be mentioned in the FLIP?
>>> 
>>> One slight concern I have with having 'sql-client.result.fetch-interval'
>> as
>>> a session configuration is that users could set it low and cause the
>> client
>>> to send a large volume of requests to the SQL gateway.
>>> 
>>> Generally, I'd like to see some way for the server to be able to limit
>> the
>>> number of requests it receives.  If that really needs to be done by a
>> proxy
>>> in front of the SQL gateway, that is fine as well.  (To be clear, I don't
>>> think my concern here should be blocking in any way.)
>>> 
>>> 7.
>>>> What is the serialization lifecycle for results?
>>> 
>>> I wonder if two other options are possible:
>>> 3) Could the Gateway just forward the result byte array?  (Or does the
>>> Gateway need to deserialize the response in order to understand it for
>> some
>>> reason?)
>>> 4) Could the JobManager prepare the results in JSON?  (Or similarly could
>>> the Client read the format which the JobManager sends?)
>>> 
>>> Thanks again!
>>> 
>>> Cheers,
>>> 
>>> Jim
>>> 
>>> On Wed, Nov 30, 2022 at 9:40 AM yu zelin <[email protected]> wrote:
>>> 
>>>> Hi, all
>>>> 
>>>> Thanks Jim’s questions below. Here I’d like to reply to them.
>>>> 
>>>>> 1. For the Client Parser, is it going to work with the extended syntax
>>>>> from the Flink Table Store?
>>>>> 
>>>>> 2. Relatedly, what will happen if an older Client tries to handle
>>>> syntax
>>>>> that a newer service supports?  (Suppose I use a 1.17 client with a
>>>> 1.18
>>>>> Gateway/system which has a new keyword.  Is there anything we should
>> be
>>>>> designing for upfront?)
>>>>> 
>>>>> 3. How will client and server version mismatches be handled?  Will a
>>>>> single gateway be able to support multiple endpoint versions?
>>>>> 4. How are commands which change a session handled?  Are those sent
>> via
>>>>> an ExecuteStatementRequest?
>>>>> 
>>>>> 5. The remote POC uses polling for getting back status and getting
>> back
>>>>> results.  Would it be possible to switch to web sockets or some other
>>>>> mechanism to avoid polling?  If polling is used for both, the polling
>>>>> frequency should be different between local and remote configurations.
>>>>> 
>>>>> 6. What does this sentence mean?  "The reason why we didn't get the
>> sql
>>>>> type in client side is because it's hard for the lightweight
>>>> client-level
>>>>> parser to recognize some sql type  sql, such as query with CTE.  "
>>>>> 
>>>>> 7. What is the serialization lifecycle for results?  It makes sense to
>>>>> have some control over whether the gateway returns results as SQL or
>>>> JSON.
>>>>> I'd love to see a way to avoid needing to serialize and deserialize
>>>> results
>>>>> on the SQL Gateway if possible.  I'm still new enough to the project
>>>> that
>>>>> I'm not sure if that's readily possible.  Maybe the SQL Gateway's
>>>> return
>>>>> type can be sent as part of the request so that the JobManager can
>> send
>>>>> back results in an advantageous format?
>>>>> 
>>>>> 8. Does ErrorType need to be marked as @PublicEvolving?
>>>>> 
>>>>> I'm excited for the SQL client to support gateway mode!  Given the
>> change
>>>>> in design, do you think it'll still be part of the Flink 1.17 release?
>>>> 
>>>> 1.  ClientParser can work with new (and unknown) SQL syntax. It is
>> because
>>>> if the
>>>> sql type is not recognized, the sql will be submitted to the gateway
>>>> directly.
>>>> 
>>>> For more information: Actually, the proposed ClientParser only do two
>>>> things:
>>>> (1) Tell client commands (help, clear, etc) and sqls apart.
>>>> (2) parses several sql types (e.g. SHOW CREATE statement, we can print
>> raw
>>>> string
>>>> for the SHOW CREATE result instead of table). Here the recognization of
>>>> sql types
>>>> mostly affects the print style, and unrecognized sql also can be
>> submitted
>>>> to cluster.
>>>> So the Client with new ClientParser can work compatible with new syntax.
>>>> 
>>>> 2. First, I'd like to explain that the gateway APIs and supported syntax
>>>> is two things.
>>>> For example, ‘configureSession' and 'completeStatement' are APIs. As
>>>> mentioned
>>>> in #1, the sql statements which syntax is unknown will be submitted to
>> the
>>>> gateway,
>>>> and whether they can be executed normally depends on whether the
>> execution
>>>> environment supports the syntax.
>>>> 
>>>>> Is there anything we should be designing for upfront?
>>>> 
>>>> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql
>>>> gateway APIs.
>>>> 
>>>> 3.
>>>>> How will client and server version mismatches be handled?
>>>> 
>>>> A lower version client can work compatible with a higher version gateway
>>>> because the
>>>> old interfaces won’t be deleted. When a higher version client connects
>> to
>>>> a lower version
>>>> gateway, the client should notify the users if they try to use
>> unsupported
>>>> features. For
>>>> example, the client start option ‘-i’  means using initialization file
>> to
>>>> initialize the session.
>>>> We plan to use the gateway’s ‘configureSession’ to implement it. But
>> this
>>>> API is not
>>>> implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the
>>>> user try to
>>>> use ‘-i’ option to start the client with the 1.16 gateway, the client
>>>> should tell the user that
>>>> Can’t execute ‘-i’ option with gateway which version is lower than V2.
>>>> 
>>>>> Will a single gateway be able to support multiple endpoint versions?
>>>> 
>>>> Currently, the gateway only starts a highest version endpoint and the
>>>> higher version endpoint
>>>> is compatible with the lower version endpoint’s protocol.
>>>> 
>>>> 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the
>> session
>>>> configuration.
>>>> Notice: the client can’t change the session (I mean, close current
>> session
>>>> and open another
>>>> one). I’m not sure if you have need to change the session itself?
>>>> 
>>>> 5.
>>>>> Would it be possible to switch to web sockets or some other mechanism
>>>> to avoid polling?
>>>> 
>>>> Your suggestion is very good, but this flip is for supporting the remote
>>>> client. How about taking
>>>> it as a future work?
>>>> 
>>>>> If polling is used for both, the polling frequency should be different
>>>> between local and remote
>>>> configurations.
>>>> 
>>>> Our idea is to introduce a new session option (like
>>>> 'sql-client.result.fetch-interval') to control
>>>> the fetching requests sending frequency. What do you think?
>>>> 
>>>> For more information: we are inclined to keep the polling behavior in
>> this
>>>> version. For streaming
>>>> query, fetching results synchronously may occupy resources of the
>> gateway
>>>> in a long period.
>>>> For example, if the job doesn’t return results for a long time because
>> the
>>>> window has not been
>>>> triggered, the synchronously fetching will keep occupying the
>> connection.
>>>> In asynchronous
>>>> situation, the gateway can return a NOT_READY_RESULT quickly and release
>>>> the resources
>>>> for other clients to use. I think we can make some improvements for the
>>>> whole flow path in the
>>>> future.
>>>> 
>>>> 6. Sorry for that there is mistakes in this sentence. Let me make it
>> clear.
>>>> 
>>>> We proposed to add 'ContentType' to indicates the result is for what
>> kind
>>>> of sql. In this sentence,
>>>> I want to explain why we add 'ContentType' since the ClientParser can
>>>> recognize the sql type too.
>>>> It is because the proposed ClientParser can't recognize complex syntax.
>>>> For example, it can’t
>>>> recognize query with CTE. So the result should carry content type
>>>> information to help the client to
>>>> know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates
>>>> the result is for a
>>>> query statement.
>>>> 
>>>> 7.
>>>>> What is the serialization lifecycle for results?
>>>> 
>>>> 1) Sink to JobManager        : RowData -> Byte[ ] (serialize)
>>>> 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize)
>>>> 3) Gateway sending            : RowData -> Byte[ ] (serialized to JSON
>>>> format)
>>>> 4) Client receiving               : Byte[ ] -> RowData (deserialize)
>>>> 
>>>>> Maybe the SQL Gateway's return type can be sent as part of the request
>>>> so that the
>>>> JobManager can send  back results in an advantageous format?
>>>> 
>>>> Yes. I think it's an improvement for the Client and Gateway. We have
>> some
>>>> ideas. For example,
>>>> 
>>>> 1) We can move the Gateway into the JobManager and reduce the Ser/De
>> costs
>>>> from JM to Gateway.
>>>> 2) Or the Gateway can collect the data from the sink function directly
>>>> instead of JobManager.
>>>> 
>>>> But I think we can leave this as a future work and discuss in another
>>>> thread.
>>>> 
>>>> 8. Yes.
>>>> 
>>>>> Do you think it'll still be part of the Flink 1.17 release?
>>>> Yes. We will try our best to finish the work.
>>>> 
>>>> Feel free to talk to me if I’m wrong or you have any other questions.
>>>> 
>>>> 
>>>>> 2022年11月25日 11:48，yu zelin <[email protected]> 写道：
>>>>> 
>>>>> Hi, all
>>>>> 
>>>>> I want to initiate a discussion on the FLIP-275: Support Remote SQL
>>>> Client Based on SQL Gateway[1].
>>>>> The motivation of this FLIP is that the current SQL Client allows only
>>>> local connection which can not satisfy
>>>>> the common need of connecting to a remote cluster.
>>>>> 
>>>>> Since the FLIP-91[2] has introduced SQL Gateway, we proposed to
>>>> implement the Remote SQL Client
>>>>> based on SQL Gateway. In our design, we proposed two main changes:
>>>>> 
>>>>> 1. New remote mode client which performs connection to the remote
>>>> gateway through REST API.
>>>>> 2. Migration of the current local mode client. We proposed to refactor
>>>> the local client based on SQL Gateway
>>>>>  to unify the interface for two modes.
>>>>> 
>>>>> Looking forward to your suggestions.
>>>>> 
>>>>> Best,
>>>>> Yu Zelin
>>>>> 
>>>>> [1] https://cwiki.apache.org/confluence/x/T48ODg
>>>>> [2] https://cwiki.apache.org/confluence/x/rIyMC
>>>> 
>>>> 
>> 
>>

Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Reply via email to