Hi, everyone, Looks like our new design is similar to Timo’s suggestion, and considering that there has no response from other devs for a long time, I want to start the vote on Thursday.
Best, Yu Zelin > 2022年12月13日 16:23,yu zelin <yuzelin....@gmail.com> 写道: > > Hi, Timo, > > Thanks for your suggestion. Recently I have discussed with @Godfrey He, > @Shengkai Fang > and @Jark Wu about the `RowFormat` (Thanks for all your suggestions). We > finally came to > a consensus which is similar to your suggestion. The details are as follows: > > 1. Add a REST query parameter ‘RowFormat’ = JSON/PLAIN_TEXT to tell the REST > Endpoint > how to deserialize the RowData int ResultSet. > > JSON format means the RowData will be serialized to JSON format, which > contains original > LogicalType information, so it can be deserialized back to RowData. > > PLAIN_TEXT format means the RowData will be serialized to SQL-compliant, > plain strings. > The SQL Client can print the strings directly. > > The example URI for fetching results is: > > /v2/sessions/:session_handle/operations/:operation_handle/result/:token?rowFormat=PLAIN_TEXT > > 2. Introduce two response bodies for fetching results in two formats. > > For more details, please take a look at the FLIP > [https://cwiki.apache.org/confluence/x/T48ODg]. > I have updated it with an example of query response bodies in two format in > section: > Public Interface -> REST Endpoint Modification. > >> 2022年12月12日 18:09,Timo Walther <twal...@apache.org> 写道: >> >> Hi everyone, >> >> sorry to jump into this discussion so late. >> >> > So we decided to revert the RowFormat related changes and let the client >> > to resolve the print format. >> >> Could you elaborate a bit on this topic in the FLIP? I still believe that we >> need 2 types of output formats. >> >> Format A: for the SQL Client CLI and other interactive notebooks that just >> uses SQL CAST(... AS STRING) semantics executed on the server side >> >> Format B: for JDBC SDK or other machine-readable downstream libraries >> >> Take a TIMESTAMP WITH LOCAL TIME ZONE as an example. The string >> representation depends on a session configuration option. Clients might not >> be aware of this session option, so the formatting must happen on the server >> side. >> >> However, when the downstream consumer is a library, maybe the library would >> like to get the raw millis/nanos since epoch. >> >> Also nested rows and collections might be better encoded with format B for >> libraries but interactive sessions are happy if nested types are already >> formatted server-side, so not every client needs custom code for the >> formatting. >> >> Regards, >> Timo >> >> >> >> On 06.12.22 15:13, godfrey he wrote: >>> Hi, zeklin >>>> The CLI will use default print style for the non-query result. >>> Please make sure the print results of EXPLAIN/DESC/SHOW CREATE TABLE >>> commands are clear. >>>> We think it’s better to add the root cause to the ErrorResponseBody. >>> LGTM >>> Best, >>> Godfrey >>> yu zelin <yuzelin....@gmail.com> 于2022年12月6日周二 17:51写道: >>>> >>>> Hi, Godfrey >>>> >>>> Thanks for your feedback. Below is my thoughts about your questions. >>>> >>>> 1. About RowFormat. >>>> I agree to your opinion. So we decided to revert the RowFormat related >>>> changes >>>> and let the client to resolve the print format. >>>> >>>> 2. About ContentType >>>> I agree that the definition of the ContentType is not clear. But how to >>>> define the >>>> statement type is another big question. So, we decided to only tell the >>>> query result >>>> and non-query result apart. The CLI will use default print style for the >>>> non-query >>>> result. >>>> >>>> 3. About ErrorHandling >>>> I think reuse the current ErrorResponseBody is good, but parse the root >>>> cause >>>> from the exception stack strings is quite hacking. We think it’s better to >>>> add the >>>> root cause to the ErrorResponseBody. >>>> >>>> 4. About Runtime REST API Modifications >>>> I agree, too. This part is moved to the ‘Future Work’. >>>> >>>> Best, >>>> Yu Zelin >>>> >>>> >>>>> 2022年12月5日 18:33,godfrey he <godfre...@gmail.com> 写道: >>>>> >>>>> Hi Zelin, >>>>> >>>>> Thanks for driving this discussion. >>>>> >>>>> I have a few comments, >>>>> >>>>>> Add RowFormat to ResultSet to indicate the format of rows. >>>>> We should not require SqlGateway server to meet the display >>>>> requirements of a CliClient. >>>>> Because different CliClients may have different display style. The >>>>> server just need to response the data, >>>>> and the CliClient prints the result as needed. So RowFormat is not needed. >>>>> >>>>>> Add ContentType to ResultSet to indicate what kind of data the result >>>>>> contains. >>>>> from my first sight, the values of ContentType are intersected, such >>>>> as: A select query will return QUERY_RESULT, >>>>> but it also has JOB_ID. OTHER is too ambiguous, I don't know which >>>>> kind of query will return OTHER. >>>>> I recommend returning the concrete type for each statement, such as >>>>> "CREATE TABLE" for "create table xx (...) with ()", >>>>> "SELECT" for "select * from xxx". The statement type can be maintained >>>>> in `Operation`s. >>>>> >>>>>> Error Handling >>>>> I think current design of error handling mechanism can meet the >>>>> requirement of CliClient, we can get the root cause from >>>>> the stack (see ErrorResponseBody#errors). If it becomes a common >>>>> requirement (for many clients) in the future, >>>>> we can introduce this interface. >>>>> >>>>>> Runtime REST API Modification for Local Client Migration >>>>> I think this part is over-engineered, this part belongs to optimization. >>>>> The client does not require very high performance, the current design >>>>> can already meet our needs. >>>>> If we find performance problems in the future, do such optimizations. >>>>> >>>>> Best, >>>>> Godfrey >>>>> >>>>> yu zelin <yuzelin....@gmail.com> 于2022年12月5日周一 11:11写道: >>>>>> >>>>>> Hi, Shammon >>>>>> >>>>>> Thanks for your feedback. I think it’s good to support jdbc-sdk. However, >>>>>> it's not supported in the gateway side yet. In my opinion, this FLIP is >>>>>> more >>>>>> concerned with the SQL Client. How about put “supporting jdbc-sdk” in >>>>>> ‘Future Work’? We can discuss how to implement it in another thread. >>>>>> >>>>>> Best, >>>>>> Yu Zelin >>>>>>> 2022年12月2日 18:12,Shammon FY <zjur...@gmail.com> 写道: >>>>>>> >>>>>>> Hi zelin >>>>>>> >>>>>>> Thanks for driving this discussion. >>>>>>> >>>>>>> I notice that the sql-client will interact with sql-gateway by `REST >>>>>>> Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk >>>>>>> for >>>>>>> sql-gateway? >>>>>>> >>>>>>> Then the sql-client can connect the gateway with jdbc-sdk, on the other >>>>>>> hand, the other applications and tools such as jmeter can use the >>>>>>> jdbc-sdk >>>>>>> to connect sql-gateway too. >>>>>>> >>>>>>> Best, >>>>>>> Shammon >>>>>>> >>>>>>> >>>>>>> On Fri, Dec 2, 2022 at 4:10 PM yu zelin <yuzelin....@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Jim, >>>>>>>> >>>>>>>> Thanks for your feedback! >>>>>>>> >>>>>>>>> Should this configuration be mentioned in the FLIP? >>>>>>>> >>>>>>>> Sure. >>>>>>>> >>>>>>>>> some way for the server to be able to limit the number of requests it >>>>>>>> receives. >>>>>>>> I’m sorry that this FLIP is dedicated in implementing the Remote mode, >>>>>>>> so >>>>>>>> we >>>>>>>> didn't consider much about this. I think the option is enough >>>>>>>> currently. >>>>>>>> I will add >>>>>>>> the improvement suggestions to the ‘Future Work’. >>>>>>>> >>>>>>>>> I wonder if two other options are possible >>>>>>>> >>>>>>>> To forward the raw format to gateway and then to client is possible. >>>>>>>> The >>>>>>>> raw >>>>>>>> results from sink is in ‘CollectResultIterator#bufferedResult’. First, >>>>>>>> we >>>>>>>> can find >>>>>>>> a way to get this result without wrapping it. Second, constructing a >>>>>>>> ‘InternalTypeInfo’. >>>>>>>> We can construct it using the schema information (data’s logical type). >>>>>>>> After >>>>>>>> construction, we can get the ’TypeSerializer’ to deserialize the raw >>>>>>>> result. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> 2022年12月1日 04:54,Jim Hughes <jhug...@confluent.io.INVALID> 写道: >>>>>>>>> >>>>>>>>> Hi Yu, >>>>>>>>> >>>>>>>>> Thanks for moving my comments to this thread! Also, thank you for >>>>>>>>> answering my questions; it is helping me understand the SQL Gateway >>>>>>>>> better. >>>>>>>>> >>>>>>>>> 5. >>>>>>>>>> Our idea is to introduce a new session option (like >>>>>>>>> 'sql-client.result.fetch-interval') to control >>>>>>>>> the fetching requests sending frequency. What do you think? >>>>>>>>> >>>>>>>>> Should this configuration be mentioned in the FLIP? >>>>>>>>> >>>>>>>>> One slight concern I have with having >>>>>>>>> 'sql-client.result.fetch-interval' >>>>>>>> as >>>>>>>>> a session configuration is that users could set it low and cause the >>>>>>>> client >>>>>>>>> to send a large volume of requests to the SQL gateway. >>>>>>>>> >>>>>>>>> Generally, I'd like to see some way for the server to be able to limit >>>>>>>> the >>>>>>>>> number of requests it receives. If that really needs to be done by a >>>>>>>> proxy >>>>>>>>> in front of the SQL gateway, that is fine as well. (To be clear, I >>>>>>>>> don't >>>>>>>>> think my concern here should be blocking in any way.) >>>>>>>>> >>>>>>>>> 7. >>>>>>>>>> What is the serialization lifecycle for results? >>>>>>>>> >>>>>>>>> I wonder if two other options are possible: >>>>>>>>> 3) Could the Gateway just forward the result byte array? (Or does the >>>>>>>>> Gateway need to deserialize the response in order to understand it for >>>>>>>> some >>>>>>>>> reason?) >>>>>>>>> 4) Could the JobManager prepare the results in JSON? (Or similarly >>>>>>>>> could >>>>>>>>> the Client read the format which the JobManager sends?) >>>>>>>>> >>>>>>>>> Thanks again! >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Jim >>>>>>>>> >>>>>>>>> On Wed, Nov 30, 2022 at 9:40 AM yu zelin <yuzelin....@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi, all >>>>>>>>>> >>>>>>>>>> Thanks Jim’s questions below. Here I’d like to reply to them. >>>>>>>>>> >>>>>>>>>>> 1. For the Client Parser, is it going to work with the extended >>>>>>>>>>> syntax >>>>>>>>>>> from the Flink Table Store? >>>>>>>>>>> >>>>>>>>>>> 2. Relatedly, what will happen if an older Client tries to handle >>>>>>>>>> syntax >>>>>>>>>>> that a newer service supports? (Suppose I use a 1.17 client with a >>>>>>>>>> 1.18 >>>>>>>>>>> Gateway/system which has a new keyword. Is there anything we should >>>>>>>> be >>>>>>>>>>> designing for upfront?) >>>>>>>>>>> >>>>>>>>>>> 3. How will client and server version mismatches be handled? Will a >>>>>>>>>>> single gateway be able to support multiple endpoint versions? >>>>>>>>>>> 4. How are commands which change a session handled? Are those sent >>>>>>>> via >>>>>>>>>>> an ExecuteStatementRequest? >>>>>>>>>>> >>>>>>>>>>> 5. The remote POC uses polling for getting back status and getting >>>>>>>> back >>>>>>>>>>> results. Would it be possible to switch to web sockets or some >>>>>>>>>>> other >>>>>>>>>>> mechanism to avoid polling? If polling is used for both, the >>>>>>>>>>> polling >>>>>>>>>>> frequency should be different between local and remote >>>>>>>>>>> configurations. >>>>>>>>>>> >>>>>>>>>>> 6. What does this sentence mean? "The reason why we didn't get the >>>>>>>> sql >>>>>>>>>>> type in client side is because it's hard for the lightweight >>>>>>>>>> client-level >>>>>>>>>>> parser to recognize some sql type sql, such as query with CTE. " >>>>>>>>>>> >>>>>>>>>>> 7. What is the serialization lifecycle for results? It makes sense >>>>>>>>>>> to >>>>>>>>>>> have some control over whether the gateway returns results as SQL or >>>>>>>>>> JSON. >>>>>>>>>>> I'd love to see a way to avoid needing to serialize and deserialize >>>>>>>>>> results >>>>>>>>>>> on the SQL Gateway if possible. I'm still new enough to the project >>>>>>>>>> that >>>>>>>>>>> I'm not sure if that's readily possible. Maybe the SQL Gateway's >>>>>>>>>> return >>>>>>>>>>> type can be sent as part of the request so that the JobManager can >>>>>>>> send >>>>>>>>>>> back results in an advantageous format? >>>>>>>>>>> >>>>>>>>>>> 8. Does ErrorType need to be marked as @PublicEvolving? >>>>>>>>>>> >>>>>>>>>>> I'm excited for the SQL client to support gateway mode! Given the >>>>>>>> change >>>>>>>>>>> in design, do you think it'll still be part of the Flink 1.17 >>>>>>>>>>> release? >>>>>>>>>> >>>>>>>>>> 1. ClientParser can work with new (and unknown) SQL syntax. It is >>>>>>>> because >>>>>>>>>> if the >>>>>>>>>> sql type is not recognized, the sql will be submitted to the gateway >>>>>>>>>> directly. >>>>>>>>>> >>>>>>>>>> For more information: Actually, the proposed ClientParser only do two >>>>>>>>>> things: >>>>>>>>>> (1) Tell client commands (help, clear, etc) and sqls apart. >>>>>>>>>> (2) parses several sql types (e.g. SHOW CREATE statement, we can >>>>>>>>>> print >>>>>>>> raw >>>>>>>>>> string >>>>>>>>>> for the SHOW CREATE result instead of table). Here the recognization >>>>>>>>>> of >>>>>>>>>> sql types >>>>>>>>>> mostly affects the print style, and unrecognized sql also can be >>>>>>>> submitted >>>>>>>>>> to cluster. >>>>>>>>>> So the Client with new ClientParser can work compatible with new >>>>>>>>>> syntax. >>>>>>>>>> >>>>>>>>>> 2. First, I'd like to explain that the gateway APIs and supported >>>>>>>>>> syntax >>>>>>>>>> is two things. >>>>>>>>>> For example, ‘configureSession' and 'completeStatement' are APIs. As >>>>>>>>>> mentioned >>>>>>>>>> in #1, the sql statements which syntax is unknown will be submitted >>>>>>>>>> to >>>>>>>> the >>>>>>>>>> gateway, >>>>>>>>>> and whether they can be executed normally depends on whether the >>>>>>>> execution >>>>>>>>>> environment supports the syntax. >>>>>>>>>> >>>>>>>>>>> Is there anything we should be designing for upfront? >>>>>>>>>> >>>>>>>>>> The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql >>>>>>>>>> gateway APIs. >>>>>>>>>> >>>>>>>>>> 3. >>>>>>>>>>> How will client and server version mismatches be handled? >>>>>>>>>> >>>>>>>>>> A lower version client can work compatible with a higher version >>>>>>>>>> gateway >>>>>>>>>> because the >>>>>>>>>> old interfaces won’t be deleted. When a higher version client >>>>>>>>>> connects >>>>>>>> to >>>>>>>>>> a lower version >>>>>>>>>> gateway, the client should notify the users if they try to use >>>>>>>> unsupported >>>>>>>>>> features. For >>>>>>>>>> example, the client start option ‘-i’ means using initialization >>>>>>>>>> file >>>>>>>> to >>>>>>>>>> initialize the session. >>>>>>>>>> We plan to use the gateway’s ‘configureSession’ to implement it. But >>>>>>>> this >>>>>>>>>> API is not >>>>>>>>>> implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if >>>>>>>>>> the >>>>>>>>>> user try to >>>>>>>>>> use ‘-i’ option to start the client with the 1.16 gateway, the client >>>>>>>>>> should tell the user that >>>>>>>>>> Can’t execute ‘-i’ option with gateway which version is lower than >>>>>>>>>> V2. >>>>>>>>>> >>>>>>>>>>> Will a single gateway be able to support multiple endpoint versions? >>>>>>>>>> >>>>>>>>>> Currently, the gateway only starts a highest version endpoint and the >>>>>>>>>> higher version endpoint >>>>>>>>>> is compatible with the lower version endpoint’s protocol. >>>>>>>>>> >>>>>>>>>> 4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the >>>>>>>> session >>>>>>>>>> configuration. >>>>>>>>>> Notice: the client can’t change the session (I mean, close current >>>>>>>> session >>>>>>>>>> and open another >>>>>>>>>> one). I’m not sure if you have need to change the session itself? >>>>>>>>>> >>>>>>>>>> 5. >>>>>>>>>>> Would it be possible to switch to web sockets or some other >>>>>>>>>>> mechanism >>>>>>>>>> to avoid polling? >>>>>>>>>> >>>>>>>>>> Your suggestion is very good, but this flip is for supporting the >>>>>>>>>> remote >>>>>>>>>> client. How about taking >>>>>>>>>> it as a future work? >>>>>>>>>> >>>>>>>>>>> If polling is used for both, the polling frequency should be >>>>>>>>>>> different >>>>>>>>>> between local and remote >>>>>>>>>> configurations. >>>>>>>>>> >>>>>>>>>> Our idea is to introduce a new session option (like >>>>>>>>>> 'sql-client.result.fetch-interval') to control >>>>>>>>>> the fetching requests sending frequency. What do you think? >>>>>>>>>> >>>>>>>>>> For more information: we are inclined to keep the polling behavior in >>>>>>>> this >>>>>>>>>> version. For streaming >>>>>>>>>> query, fetching results synchronously may occupy resources of the >>>>>>>> gateway >>>>>>>>>> in a long period. >>>>>>>>>> For example, if the job doesn’t return results for a long time >>>>>>>>>> because >>>>>>>> the >>>>>>>>>> window has not been >>>>>>>>>> triggered, the synchronously fetching will keep occupying the >>>>>>>> connection. >>>>>>>>>> In asynchronous >>>>>>>>>> situation, the gateway can return a NOT_READY_RESULT quickly and >>>>>>>>>> release >>>>>>>>>> the resources >>>>>>>>>> for other clients to use. I think we can make some improvements for >>>>>>>>>> the >>>>>>>>>> whole flow path in the >>>>>>>>>> future. >>>>>>>>>> >>>>>>>>>> 6. Sorry for that there is mistakes in this sentence. Let me make it >>>>>>>> clear. >>>>>>>>>> >>>>>>>>>> We proposed to add 'ContentType' to indicates the result is for what >>>>>>>> kind >>>>>>>>>> of sql. In this sentence, >>>>>>>>>> I want to explain why we add 'ContentType' since the ClientParser can >>>>>>>>>> recognize the sql type too. >>>>>>>>>> It is because the proposed ClientParser can't recognize complex >>>>>>>>>> syntax. >>>>>>>>>> For example, it can’t >>>>>>>>>> recognize query with CTE. So the result should carry content type >>>>>>>>>> information to help the client to >>>>>>>>>> know the sql type. For example, the 'ContentType.QUERY_RESULT' >>>>>>>>>> indicates >>>>>>>>>> the result is for a >>>>>>>>>> query statement. >>>>>>>>>> >>>>>>>>>> 7. >>>>>>>>>>> What is the serialization lifecycle for results? >>>>>>>>>> >>>>>>>>>> 1) Sink to JobManager : RowData -> Byte[ ] (serialize) >>>>>>>>>> 2) JobManager to Gateway : Byte[ ] -> RowData (deserialize) >>>>>>>>>> 3) Gateway sending : RowData -> Byte[ ] (serialized to >>>>>>>>>> JSON >>>>>>>>>> format) >>>>>>>>>> 4) Client receiving : Byte[ ] -> RowData (deserialize) >>>>>>>>>> >>>>>>>>>>> Maybe the SQL Gateway's return type can be sent as part of the >>>>>>>>>>> request >>>>>>>>>> so that the >>>>>>>>>> JobManager can send back results in an advantageous format? >>>>>>>>>> >>>>>>>>>> Yes. I think it's an improvement for the Client and Gateway. We have >>>>>>>> some >>>>>>>>>> ideas. For example, >>>>>>>>>> >>>>>>>>>> 1) We can move the Gateway into the JobManager and reduce the Ser/De >>>>>>>> costs >>>>>>>>>> from JM to Gateway. >>>>>>>>>> 2) Or the Gateway can collect the data from the sink function >>>>>>>>>> directly >>>>>>>>>> instead of JobManager. >>>>>>>>>> >>>>>>>>>> But I think we can leave this as a future work and discuss in another >>>>>>>>>> thread. >>>>>>>>>> >>>>>>>>>> 8. Yes. >>>>>>>>>> >>>>>>>>>>> Do you think it'll still be part of the Flink 1.17 release? >>>>>>>>>> Yes. We will try our best to finish the work. >>>>>>>>>> >>>>>>>>>> Feel free to talk to me if I’m wrong or you have any other questions. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> 2022年11月25日 11:48,yu zelin <yuzelin....@gmail.com> 写道: >>>>>>>>>>> >>>>>>>>>>> Hi, all >>>>>>>>>>> >>>>>>>>>>> I want to initiate a discussion on the FLIP-275: Support Remote SQL >>>>>>>>>> Client Based on SQL Gateway[1]. >>>>>>>>>>> The motivation of this FLIP is that the current SQL Client allows >>>>>>>>>>> only >>>>>>>>>> local connection which can not satisfy >>>>>>>>>>> the common need of connecting to a remote cluster. >>>>>>>>>>> >>>>>>>>>>> Since the FLIP-91[2] has introduced SQL Gateway, we proposed to >>>>>>>>>> implement the Remote SQL Client >>>>>>>>>>> based on SQL Gateway. In our design, we proposed two main changes: >>>>>>>>>>> >>>>>>>>>>> 1. New remote mode client which performs connection to the remote >>>>>>>>>> gateway through REST API. >>>>>>>>>>> 2. Migration of the current local mode client. We proposed to >>>>>>>>>>> refactor >>>>>>>>>> the local client based on SQL Gateway >>>>>>>>>>> to unify the interface for two modes. >>>>>>>>>>> >>>>>>>>>>> Looking forward to your suggestions. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Yu Zelin >>>>>>>>>>> >>>>>>>>>>> [1] https://cwiki.apache.org/confluence/x/T48ODg >>>>>>>>>>> [2] https://cwiki.apache.org/confluence/x/rIyMC >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>> >> >