Hi zelin Thanks for driving this discussion.
I notice that the sql-client will interact with sql-gateway by `REST Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for sql-gateway? Then the sql-client can connect the gateway with jdbc-sdk, on the other hand, the other applications and tools such as jmeter can use the jdbc-sdk to connect sql-gateway too. Best, Shammon On Fri, Dec 2, 2022 at 4:10 PM yu zelin <yuzelin....@gmail.com> wrote: > Hi Jim, > > Thanks for your feedback! > > > Should this configuration be mentioned in the FLIP? > > Sure. > > > some way for the server to be able to limit the number of requests it > receives. > I’m sorry that this FLIP is dedicated in implementing the Remote mode, so > we > didn't consider much about this. I think the option is enough currently. > I will add > the improvement suggestions to the ‘Future Work’. > > > I wonder if two other options are possible > > To forward the raw format to gateway and then to client is possible. The > raw > results from sink is in ‘CollectResultIterator#bufferedResult’. First, we > can find > a way to get this result without wrapping it. Second, constructing a > ‘InternalTypeInfo’. > We can construct it using the schema information (data’s logical type). > After > construction, we can get the ’TypeSerializer’ to deserialize the raw > result. > > > > > > 2022年12月1日 04:54,Jim Hughes <jhug...@confluent.io.INVALID> 写道: > > > > Hi Yu, > > > > Thanks for moving my comments to this thread! Also, thank you for > > answering my questions; it is helping me understand the SQL Gateway > > better. > > > > 5. > >> Our idea is to introduce a new session option (like > > 'sql-client.result.fetch-interval') to control > > the fetching requests sending frequency. What do you think? > > > > Should this configuration be mentioned in the FLIP? > > > > One slight concern I have with having 'sql-client.result.fetch-interval' > as > > a session configuration is that users could set it low and cause the > client > > to send a large volume of requests to the SQL gateway. > > > > Generally, I'd like to see some way for the server to be able to limit > the > > number of requests it receives. If that really needs to be done by a > proxy > > in front of the SQL gateway, that is fine as well. (To be clear, I don't > > think my concern here should be blocking in any way.) > > > > 7. > >> What is the serialization lifecycle for results? > > > > I wonder if two other options are possible: > > 3) Could the Gateway just forward the result byte array? (Or does the > > Gateway need to deserialize the response in order to understand it for > some > > reason?) > > 4) Could the JobManager prepare the results in JSON? (Or similarly could > > the Client read the format which the JobManager sends?) > > > > Thanks again! > > > > Cheers, > > > > Jim > > > > On Wed, Nov 30, 2022 at 9:40 AM yu zelin <yuzelin....@gmail.com> wrote: > > > >> Hi, all > >> > >> Thanks Jim’s questions below. Here I’d like to reply to them. > >> > >>> 1. For the Client Parser, is it going to work with the extended syntax > >>> from the Flink Table Store? > >>> > >>> 2. Relatedly, what will happen if an older Client tries to handle > >> syntax > >>> that a newer service supports? (Suppose I use a 1.17 client with a > >> 1.18 > >>> Gateway/system which has a new keyword. Is there anything we should > be > >>> designing for upfront?) > >>> > >>> 3. How will client and server version mismatches be handled? Will a > >>> single gateway be able to support multiple endpoint versions? > >>> 4. How are commands which change a session handled? Are those sent > via > >>> an ExecuteStatementRequest? > >>> > >>> 5. The remote POC uses polling for getting back status and getting > back > >>> results. Would it be possible to switch to web sockets or some other > >>> mechanism to avoid polling? If polling is used for both, the polling > >>> frequency should be different between local and remote configurations. > >>> > >>> 6. What does this sentence mean? "The reason why we didn't get the > sql > >>> type in client side is because it's hard for the lightweight > >> client-level > >>> parser to recognize some sql type sql, such as query with CTE. " > >>> > >>> 7. What is the serialization lifecycle for results? It makes sense to > >>> have some control over whether the gateway returns results as SQL or > >> JSON. > >>> I'd love to see a way to avoid needing to serialize and deserialize > >> results > >>> on the SQL Gateway if possible. I'm still new enough to the project > >> that > >>> I'm not sure if that's readily possible. Maybe the SQL Gateway's > >> return > >>> type can be sent as part of the request so that the JobManager can > send > >>> back results in an advantageous format? > >>> > >>> 8. Does ErrorType need to be marked as @PublicEvolving? > >>> > >>> I'm excited for the SQL client to support gateway mode! Given the > change > >>> in design, do you think it'll still be part of the Flink 1.17 release? > >> > >> 1. ClientParser can work with new (and unknown) SQL syntax. It is > because > >> if the > >> sql type is not recognized, the sql will be submitted to the gateway > >> directly. > >> > >> For more information: Actually, the proposed ClientParser only do two > >> things: > >> (1) Tell client commands (help, clear, etc) and sqls apart. > >> (2) parses several sql types (e.g. SHOW CREATE statement, we can print > raw > >> string > >> for the SHOW CREATE result instead of table). Here the recognization of > >> sql types > >> mostly affects the print style, and unrecognized sql also can be > submitted > >> to cluster. > >> So the Client with new ClientParser can work compatible with new syntax. > >> > >> 2. First, I'd like to explain that the gateway APIs and supported syntax > >> is two things. > >> For example, ‘configureSession' and 'completeStatement' are APIs. As > >> mentioned > >> in #1, the sql statements which syntax is unknown will be submitted to > the > >> gateway, > >> and whether they can be executed normally depends on whether the > execution > >> environment supports the syntax. > >> > >>> Is there anything we should be designing for upfront? > >> > >> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql > >> gateway APIs. > >> > >> 3. > >>> How will client and server version mismatches be handled? > >> > >> A lower version client can work compatible with a higher version gateway > >> because the > >> old interfaces won’t be deleted. When a higher version client connects > to > >> a lower version > >> gateway, the client should notify the users if they try to use > unsupported > >> features. For > >> example, the client start option ‘-i’ means using initialization file > to > >> initialize the session. > >> We plan to use the gateway’s ‘configureSession’ to implement it. But > this > >> API is not > >> implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the > >> user try to > >> use ‘-i’ option to start the client with the 1.16 gateway, the client > >> should tell the user that > >> Can’t execute ‘-i’ option with gateway which version is lower than V2. > >> > >>> Will a single gateway be able to support multiple endpoint versions? > >> > >> Currently, the gateway only starts a highest version endpoint and the > >> higher version endpoint > >> is compatible with the lower version endpoint’s protocol. > >> > >> 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the > session > >> configuration. > >> Notice: the client can’t change the session (I mean, close current > session > >> and open another > >> one). I’m not sure if you have need to change the session itself? > >> > >> 5. > >>> Would it be possible to switch to web sockets or some other mechanism > >> to avoid polling? > >> > >> Your suggestion is very good, but this flip is for supporting the remote > >> client. How about taking > >> it as a future work? > >> > >>> If polling is used for both, the polling frequency should be different > >> between local and remote > >> configurations. > >> > >> Our idea is to introduce a new session option (like > >> 'sql-client.result.fetch-interval') to control > >> the fetching requests sending frequency. What do you think? > >> > >> For more information: we are inclined to keep the polling behavior in > this > >> version. For streaming > >> query, fetching results synchronously may occupy resources of the > gateway > >> in a long period. > >> For example, if the job doesn’t return results for a long time because > the > >> window has not been > >> triggered, the synchronously fetching will keep occupying the > connection. > >> In asynchronous > >> situation, the gateway can return a NOT_READY_RESULT quickly and release > >> the resources > >> for other clients to use. I think we can make some improvements for the > >> whole flow path in the > >> future. > >> > >> 6. Sorry for that there is mistakes in this sentence. Let me make it > clear. > >> > >> We proposed to add 'ContentType' to indicates the result is for what > kind > >> of sql. In this sentence, > >> I want to explain why we add 'ContentType' since the ClientParser can > >> recognize the sql type too. > >> It is because the proposed ClientParser can't recognize complex syntax. > >> For example, it can’t > >> recognize query with CTE. So the result should carry content type > >> information to help the client to > >> know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates > >> the result is for a > >> query statement. > >> > >> 7. > >>> What is the serialization lifecycle for results? > >> > >> 1) Sink to JobManager : RowData -> Byte[ ] (serialize) > >> 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize) > >> 3) Gateway sending : RowData -> Byte[ ] (serialized to JSON > >> format) > >> 4) Client receiving : Byte[ ] -> RowData (deserialize) > >> > >>> Maybe the SQL Gateway's return type can be sent as part of the request > >> so that the > >> JobManager can send back results in an advantageous format? > >> > >> Yes. I think it's an improvement for the Client and Gateway. We have > some > >> ideas. For example, > >> > >> 1) We can move the Gateway into the JobManager and reduce the Ser/De > costs > >> from JM to Gateway. > >> 2) Or the Gateway can collect the data from the sink function directly > >> instead of JobManager. > >> > >> But I think we can leave this as a future work and discuss in another > >> thread. > >> > >> 8. Yes. > >> > >>> Do you think it'll still be part of the Flink 1.17 release? > >> Yes. We will try our best to finish the work. > >> > >> Feel free to talk to me if I’m wrong or you have any other questions. > >> > >> > >>> 2022年11月25日 11:48,yu zelin <yuzelin....@gmail.com> 写道: > >>> > >>> Hi, all > >>> > >>> I want to initiate a discussion on the FLIP-275: Support Remote SQL > >> Client Based on SQL Gateway[1]. > >>> The motivation of this FLIP is that the current SQL Client allows only > >> local connection which can not satisfy > >>> the common need of connecting to a remote cluster. > >>> > >>> Since the FLIP-91[2] has introduced SQL Gateway, we proposed to > >> implement the Remote SQL Client > >>> based on SQL Gateway. In our design, we proposed two main changes: > >>> > >>> 1. New remote mode client which performs connection to the remote > >> gateway through REST API. > >>> 2. Migration of the current local mode client. We proposed to refactor > >> the local client based on SQL Gateway > >>> to unify the interface for two modes. > >>> > >>> Looking forward to your suggestions. > >>> > >>> Best, > >>> Yu Zelin > >>> > >>> [1] https://cwiki.apache.org/confluence/x/T48ODg > >>> [2] https://cwiki.apache.org/confluence/x/rIyMC > >> > >> > >