Hi David, Thank you very much for your proposal. My comments about it are as follows:
About PollFlightInfo: Many SQL queries (in fact, almost all OLAP queries?) cannot produce any output records until it completes - because of GROUP BY or ORDER BY clause. In that case, PollFlightInfo can degenerate to GetFlightInfo since the server will not respond unless there are changes to the result. If the 'progress' field of RetryInfo is also regarded as the result, the server can respond with a different progress value. But the server that does not know the progress information cannot use that. The client can call the RPC with a timeout to avoid arbitrarily long polling, but in that case, the client would not be able to get a descriptor for cancellation of the query if the first PollFlightInfo does not return soon. Maybe it should be specified that the server processing PollFlightInfo must return immediately after it parses the query and starts executing it to provide the cancel_descriptor as soon as possible. Regarding cancel_descriptor, it would be nice for the server to unset it even if the query is still in progress, to notify the client that the query cancellation is not supported. BTW, I thought of something like StreamingGetFlightInfo, which is a bidirectional streaming version of PollFlightInfo. But maybe PollFlightInfo is better since the other client that does not own the GRPC call stream can cancel the query. (Or maybe StreamingGetFlightInfo can send cancel_descriptor for use outside the stream.) About CloseQuery: I think that it would be great if the RPC call is in Flight RPC rather than in FlightSQL RPC since the FlightInfo that it tries to close is got from GetFlightInfo/PollFlightInfo in Flight RPC. In that case, maybe it would be nice to name it 'CloseFlightInfo', to be matched with GetFlightInfo. About RefreshQuery: Same as CloseQuery. Maybe it can be named 'RetainFlightInfo'. About CancelQuery: I don't know how to use it. CancenQuery requires FlightInfo from the server. But by the time the client receives FlightInfo, the query has been already completed, doesn't it? Another (unrelated?) request (not in the proposal): In DoGet, the client must consume the whole endpoint. It can make it difficult for a client who only wants to or can retrieve only a small portion of it. (For example, there may be a web client that displays the result in tabular format page-by-page. A web server can cache the DoGet result, but by doing that the web server must manage a state. A stateful web server is harder to implement and manage.) Can we have a variant of DoGet that only retrieves a portion of an endpoint? That RPC method can have record_offset and record_count arguments. (Maybe it defeats the purpose of Flight RPC which prefers fast, bulk transfer.) Thank you. -----Original Message----- From: David Li <lidav...@apache.org> Sent: Wednesday, February 15, 2023 8:06 AM To: dev@arrow.apache.org Subject: Re: [DISCUSS] Flight RPC/Flight SQL/ADBC enhancements Ah, right. I haven't written up the last set of ADBC proposals yet. I'll do that in the next day or two. On Tue, Feb 14, 2023, at 17:38, Will Jones wrote: > Hi David, > > The proposals in the Flight/Flight SQL document look excellent. As > I've been looking at ADBC I've been wondering about polling / async > execution, cancellation, and progress indicators. Glad to see those in > the Flight document, but where are they in the ADBC issues? Do they > still need to be created? > > Best, > > Will Jones > > On Tue, Feb 14, 2023 at 12:58 PM David Li <lidav...@apache.org> wrote: > >> Hello, >> >> I would like to submit some Flight RPC and Flight SQL enhancements >> for discussion. They cover the following: >> >> - Executing 'queries' in a retryable, nonblocking way >> - Handling ordered result sets >> - Handling expiration of/re-reading result sets >> >> In addition, there are corresponding proposals for ADBC in >> anticipation of these features, James's catalogs proposal for Flight >> SQL, and other feedback. >> >> The Flight proposals are described in this document [1]. It should be >> open for comments. >> The ADBC proposals are filed as individual issues in this milestone [2]. >> >> Any feedback is much appreciated. There are not yet prototype >> implementations, but if there is a rough consensus then I can begin on that. >> >> [1]: >> https://docs.google.com/document/d/1jhPyPZSOo2iy0LqIJVUs9KWPyFULVFJXT >> ILDfkadx2g/edit?usp=sharing >> [2]: https://github.com/apache/arrow-adbc/milestone/3 >> >> Thanks, >> David >>