Hi everyone, Sorry for the incorrect message in my last email. I want to start the vote on Wednesday as long as there are no questions in this period.
Best, Yu Zelin On Tue, Dec 13, 2022 at 5:08 PM yu zelin <yuzelin....@gmail.com> wrote: > Hi, everyone, > > Looks like our new design is similar to Timo’s suggestion, and considering > that there has > no response from other devs for a long time, I want to start the vote on > Thursday. > > > Best, > Yu Zelin > > 2022年12月13日 16:23,yu zelin <yuzelin....@gmail.com> 写道: > > Hi, Timo, > > Thanks for your suggestion. Recently I have discussed with @Godfrey He, > @Shengkai Fang > and @Jark Wu about the `RowFormat` (Thanks for all your suggestions). We > finally came to > a consensus which is similar to your suggestion. The details are as > follows: > > 1. Add a REST query parameter ‘*RowFormat*’ = JSON/PLAIN_TEXT to tell the > REST Endpoint > how to deserialize the RowData int ResultSet. > > JSON format means the RowData will be serialized to JSON format, which > contains original > * LogicalType* information, so it can be deserialized back to RowData. > > PLAIN_TEXT format means the RowData will be serialized to > SQL-compliant, plain strings. > The SQL Client can print the strings directly. > > The example URI for fetching results is: > > /v2/sessions/:session_handle/operations/:operation_handle/result/:token? > rowFormat=PLAIN_TEXT > > 2. Introduce two response bodies for fetching results in two formats. > > For more details, please take a look at the FLIP [ > https://cwiki.apache.org/confluence/x/T48ODg]. > I have updated it with an example of query response bodies in two format > in section: > Public Interface -> REST Endpoint Modification. > > 2022年12月12日 18:09,Timo Walther <twal...@apache.org> 写道: > > Hi everyone, > > sorry to jump into this discussion so late. > > > So we decided to revert the RowFormat related changes and let the client > to resolve the print format. > > Could you elaborate a bit on this topic in the FLIP? I still believe that > we need 2 types of output formats. > > Format A: for the SQL Client CLI and other interactive notebooks that just > uses SQL CAST(... AS STRING) semantics executed on the server side > > Format B: for JDBC SDK or other machine-readable downstream libraries > > Take a TIMESTAMP WITH LOCAL TIME ZONE as an example. The string > representation depends on a session configuration option. Clients might not > be aware of this session option, so the formatting must happen on the > server side. > > However, when the downstream consumer is a library, maybe the library > would like to get the raw millis/nanos since epoch. > > Also nested rows and collections might be better encoded with format B for > libraries but interactive sessions are happy if nested types are already > formatted server-side, so not every client needs custom code for the > formatting. > > Regards, > Timo > > > > On 06.12.22 15:13, godfrey he wrote: > > Hi, zeklin > > The CLI will use default print style for the non-query result. > > Please make sure the print results of EXPLAIN/DESC/SHOW CREATE TABLE > commands are clear. > > We think it’s better to add the root cause to the ErrorResponseBody. > > LGTM > Best, > Godfrey > yu zelin <yuzelin....@gmail.com> 于2022年12月6日周二 17:51写道: > > > Hi, Godfrey > > Thanks for your feedback. Below is my thoughts about your questions. > > 1. About RowFormat. > I agree to your opinion. So we decided to revert the RowFormat related > changes > and let the client to resolve the print format. > > 2. About ContentType > I agree that the definition of the ContentType is not clear. But how to > define the > statement type is another big question. So, we decided to only tell the > query result > and non-query result apart. The CLI will use default print style for the > non-query > result. > > 3. About ErrorHandling > I think reuse the current ErrorResponseBody is good, but parse the root > cause > from the exception stack strings is quite hacking. We think it’s better to > add the > root cause to the ErrorResponseBody. > > 4. About Runtime REST API Modifications > I agree, too. This part is moved to the ‘Future Work’. > > Best, > Yu Zelin > > > 2022年12月5日 18:33,godfrey he <godfre...@gmail.com> 写道: > > Hi Zelin, > > Thanks for driving this discussion. > > I have a few comments, > > Add RowFormat to ResultSet to indicate the format of rows. > > We should not require SqlGateway server to meet the display > requirements of a CliClient. > Because different CliClients may have different display style. The > server just need to response the data, > and the CliClient prints the result as needed. So RowFormat is not needed. > > Add ContentType to ResultSet to indicate what kind of data the result > contains. > > from my first sight, the values of ContentType are intersected, such > as: A select query will return QUERY_RESULT, > but it also has JOB_ID. OTHER is too ambiguous, I don't know which > kind of query will return OTHER. > I recommend returning the concrete type for each statement, such as > "CREATE TABLE" for "create table xx (...) with ()", > "SELECT" for "select * from xxx". The statement type can be maintained > in `Operation`s. > > Error Handling > > I think current design of error handling mechanism can meet the > requirement of CliClient, we can get the root cause from > the stack (see ErrorResponseBody#errors). If it becomes a common > requirement (for many clients) in the future, > we can introduce this interface. > > Runtime REST API Modification for Local Client Migration > > I think this part is over-engineered, this part belongs to optimization. > The client does not require very high performance, the current design > can already meet our needs. > If we find performance problems in the future, do such optimizations. > > Best, > Godfrey > > yu zelin <yuzelin....@gmail.com> 于2022年12月5日周一 11:11写道: > > > Hi, Shammon > > Thanks for your feedback. I think it’s good to support jdbc-sdk. However, > it's not supported in the gateway side yet. In my opinion, this FLIP is > more > concerned with the SQL Client. How about put “supporting jdbc-sdk” in > ‘Future Work’? We can discuss how to implement it in another thread. > > Best, > Yu Zelin > > 2022年12月2日 18:12,Shammon FY <zjur...@gmail.com> 写道: > > Hi zelin > > Thanks for driving this discussion. > > I notice that the sql-client will interact with sql-gateway by `REST > Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for > sql-gateway? > > Then the sql-client can connect the gateway with jdbc-sdk, on the other > hand, the other applications and tools such as jmeter can use the jdbc-sdk > to connect sql-gateway too. > > Best, > Shammon > > > On Fri, Dec 2, 2022 at 4:10 PM yu zelin <yuzelin....@gmail.com> wrote: > > Hi Jim, > > Thanks for your feedback! > > Should this configuration be mentioned in the FLIP? > > > Sure. > > some way for the server to be able to limit the number of requests it > > receives. > I’m sorry that this FLIP is dedicated in implementing the Remote mode, so > we > didn't consider much about this. I think the option is enough currently. > I will add > the improvement suggestions to the ‘Future Work’. > > I wonder if two other options are possible > > > To forward the raw format to gateway and then to client is possible. The > raw > results from sink is in ‘CollectResultIterator#bufferedResult’. First, we > can find > a way to get this result without wrapping it. Second, constructing a > ‘InternalTypeInfo’. > We can construct it using the schema information (data’s logical type). > After > construction, we can get the ’TypeSerializer’ to deserialize the raw > result. > > > > > 2022年12月1日 04:54,Jim Hughes <jhug...@confluent.io.INVALID> 写道: > > Hi Yu, > > Thanks for moving my comments to this thread! Also, thank you for > answering my questions; it is helping me understand the SQL Gateway > better. > > 5. > > Our idea is to introduce a new session option (like > > 'sql-client.result.fetch-interval') to control > the fetching requests sending frequency. What do you think? > > Should this configuration be mentioned in the FLIP? > > One slight concern I have with having 'sql-client.result.fetch-interval' > > as > > a session configuration is that users could set it low and cause the > > client > > to send a large volume of requests to the SQL gateway. > > Generally, I'd like to see some way for the server to be able to limit > > the > > number of requests it receives. If that really needs to be done by a > > proxy > > in front of the SQL gateway, that is fine as well. (To be clear, I don't > think my concern here should be blocking in any way.) > > 7. > > What is the serialization lifecycle for results? > > > I wonder if two other options are possible: > 3) Could the Gateway just forward the result byte array? (Or does the > Gateway need to deserialize the response in order to understand it for > > some > > reason?) > 4) Could the JobManager prepare the results in JSON? (Or similarly could > the Client read the format which the JobManager sends?) > > Thanks again! > > Cheers, > > Jim > > On Wed, Nov 30, 2022 at 9:40 AM yu zelin <yuzelin....@gmail.com> wrote: > > Hi, all > > Thanks Jim’s questions below. Here I’d like to reply to them. > > 1. For the Client Parser, is it going to work with the extended syntax > from the Flink Table Store? > > 2. Relatedly, what will happen if an older Client tries to handle > > syntax > > that a newer service supports? (Suppose I use a 1.17 client with a > > 1.18 > > Gateway/system which has a new keyword. Is there anything we should > > be > > designing for upfront?) > > 3. How will client and server version mismatches be handled? Will a > single gateway be able to support multiple endpoint versions? > 4. How are commands which change a session handled? Are those sent > > via > > an ExecuteStatementRequest? > > 5. The remote POC uses polling for getting back status and getting > > back > > results. Would it be possible to switch to web sockets or some other > mechanism to avoid polling? If polling is used for both, the polling > frequency should be different between local and remote configurations. > > 6. What does this sentence mean? "The reason why we didn't get the > > sql > > type in client side is because it's hard for the lightweight > > client-level > > parser to recognize some sql type sql, such as query with CTE. " > > 7. What is the serialization lifecycle for results? It makes sense to > have some control over whether the gateway returns results as SQL or > > JSON. > > I'd love to see a way to avoid needing to serialize and deserialize > > results > > on the SQL Gateway if possible. I'm still new enough to the project > > that > > I'm not sure if that's readily possible. Maybe the SQL Gateway's > > return > > type can be sent as part of the request so that the JobManager can > > send > > back results in an advantageous format? > > 8. Does ErrorType need to be marked as @PublicEvolving? > > I'm excited for the SQL client to support gateway mode! Given the > > change > > in design, do you think it'll still be part of the Flink 1.17 release? > > > 1. ClientParser can work with new (and unknown) SQL syntax. It is > > because > > if the > sql type is not recognized, the sql will be submitted to the gateway > directly. > > For more information: Actually, the proposed ClientParser only do two > things: > (1) Tell client commands (help, clear, etc) and sqls apart. > (2) parses several sql types (e.g. SHOW CREATE statement, we can print > > raw > > string > for the SHOW CREATE result instead of table). Here the recognization of > sql types > mostly affects the print style, and unrecognized sql also can be > > submitted > > to cluster. > So the Client with new ClientParser can work compatible with new syntax. > > 2. First, I'd like to explain that the gateway APIs and supported syntax > is two things. > For example, ‘configureSession' and 'completeStatement' are APIs. As > mentioned > in #1, the sql statements which syntax is unknown will be submitted to > > the > > gateway, > and whether they can be executed normally depends on whether the > > execution > > environment supports the syntax. > > Is there anything we should be designing for upfront? > > > The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql > gateway APIs. > > 3. > > How will client and server version mismatches be handled? > > > A lower version client can work compatible with a higher version gateway > because the > old interfaces won’t be deleted. When a higher version client connects > > to > > a lower version > gateway, the client should notify the users if they try to use > > unsupported > > features. For > example, the client start option ‘-i’ means using initialization file > > to > > initialize the session. > We plan to use the gateway’s ‘configureSession’ to implement it. But > > this > > API is not > implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the > user try to > use ‘-i’ option to start the client with the 1.16 gateway, the client > should tell the user that > Can’t execute ‘-i’ option with gateway which version is lower than V2. > > Will a single gateway be able to support multiple endpoint versions? > > > Currently, the gateway only starts a highest version endpoint and the > higher version endpoint > is compatible with the lower version endpoint’s protocol. > > 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the > > session > > configuration. > Notice: the client can’t change the session (I mean, close current > > session > > and open another > one). I’m not sure if you have need to change the session itself? > > 5. > > Would it be possible to switch to web sockets or some other mechanism > > to avoid polling? > > Your suggestion is very good, but this flip is for supporting the remote > client. How about taking > it as a future work? > > If polling is used for both, the polling frequency should be different > > between local and remote > configurations. > > Our idea is to introduce a new session option (like > 'sql-client.result.fetch-interval') to control > the fetching requests sending frequency. What do you think? > > For more information: we are inclined to keep the polling behavior in > > this > > version. For streaming > query, fetching results synchronously may occupy resources of the > > gateway > > in a long period. > For example, if the job doesn’t return results for a long time because > > the > > window has not been > triggered, the synchronously fetching will keep occupying the > > connection. > > In asynchronous > situation, the gateway can return a NOT_READY_RESULT quickly and release > the resources > for other clients to use. I think we can make some improvements for the > whole flow path in the > future. > > 6. Sorry for that there is mistakes in this sentence. Let me make it > > clear. > > > We proposed to add 'ContentType' to indicates the result is for what > > kind > > of sql. In this sentence, > I want to explain why we add 'ContentType' since the ClientParser can > recognize the sql type too. > It is because the proposed ClientParser can't recognize complex syntax. > For example, it can’t > recognize query with CTE. So the result should carry content type > information to help the client to > know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates > the result is for a > query statement. > > 7. > > What is the serialization lifecycle for results? > > > 1) Sink to JobManager : RowData -> Byte[ ] (serialize) > 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize) > 3) Gateway sending : RowData -> Byte[ ] (serialized to JSON > format) > 4) Client receiving : Byte[ ] -> RowData (deserialize) > > Maybe the SQL Gateway's return type can be sent as part of the request > > so that the > JobManager can send back results in an advantageous format? > > Yes. I think it's an improvement for the Client and Gateway. We have > > some > > ideas. For example, > > 1) We can move the Gateway into the JobManager and reduce the Ser/De > > costs > > from JM to Gateway. > 2) Or the Gateway can collect the data from the sink function directly > instead of JobManager. > > But I think we can leave this as a future work and discuss in another > thread. > > 8. Yes. > > Do you think it'll still be part of the Flink 1.17 release? > > Yes. We will try our best to finish the work. > > Feel free to talk to me if I’m wrong or you have any other questions. > > > 2022年11月25日 11:48,yu zelin <yuzelin....@gmail.com> 写道: > > Hi, all > > I want to initiate a discussion on the FLIP-275: Support Remote SQL > > Client Based on SQL Gateway[1]. > > The motivation of this FLIP is that the current SQL Client allows only > > local connection which can not satisfy > > the common need of connecting to a remote cluster. > > Since the FLIP-91[2] has introduced SQL Gateway, we proposed to > > implement the Remote SQL Client > > based on SQL Gateway. In our design, we proposed two main changes: > > 1. New remote mode client which performs connection to the remote > > gateway through REST API. > > 2. Migration of the current local mode client. We proposed to refactor > > the local client based on SQL Gateway > > to unify the interface for two modes. > > Looking forward to your suggestions. > > Best, > Yu Zelin > > [1] https://cwiki.apache.org/confluence/x/T48ODg > [2] https://cwiki.apache.org/confluence/x/rIyMC > > > > > > > > > > >