Hi David,

Thank you very much for your proposal.
My comments about it are as follows:

About PollFlightInfo:

Many SQL queries (in fact, almost all OLAP queries?) cannot produce any
output records until it completes - because of GROUP BY or ORDER BY clause.
In that case, PollFlightInfo can degenerate to GetFlightInfo since the
server will not respond unless there are changes to the result. If the
'progress' field of RetryInfo is also regarded as the result, the server can
respond with a different progress value. But the server that does not know
the progress information cannot use that.
The client can call the RPC with a timeout to avoid arbitrarily long
polling, but in that case, the client would not be able to get a descriptor
for cancellation of the query if the first PollFlightInfo does not return
soon. Maybe it should be specified that the server processing PollFlightInfo
must return immediately after it parses the query and starts executing it to
provide the cancel_descriptor as soon as possible.
Regarding cancel_descriptor, it would be nice for the server to unset it
even if the query is still in progress, to notify the client that the query
cancellation is not supported.
BTW, I thought of something like StreamingGetFlightInfo, which is a
bidirectional streaming version of PollFlightInfo. But maybe PollFlightInfo
is better since the other client that does not own the GRPC call stream can
cancel the query. (Or maybe StreamingGetFlightInfo can send
cancel_descriptor for use outside the stream.)

About CloseQuery:

I think that it would be great if the RPC call is in Flight RPC rather than
in FlightSQL RPC since the FlightInfo that it tries to close is got from
GetFlightInfo/PollFlightInfo in Flight RPC. In that case, maybe it would be
nice to name it 'CloseFlightInfo', to be matched with GetFlightInfo.

About RefreshQuery:

Same as CloseQuery. Maybe it can be named 'RetainFlightInfo'.

About CancelQuery:

I don't know how to use it. CancenQuery requires FlightInfo from the server.
But by the time the client receives FlightInfo, the query has been already
completed, doesn't it?

Another (unrelated?) request (not in the proposal):

In DoGet, the client must consume the whole endpoint. It can make it
difficult for a client who only wants to or can retrieve only a small
portion of it. (For example, there may be a web client that displays the
result in tabular format page-by-page. A web server can cache the DoGet
result, but by doing that the web server must manage a state. A stateful web
server is harder to implement and manage.) Can we have a variant of DoGet
that only retrieves a portion of an endpoint? That RPC method can have
record_offset and record_count arguments. (Maybe it defeats the purpose of
Flight RPC which prefers fast, bulk transfer.)

Thank you.

-----Original Message-----
From: David Li <lidav...@apache.org> 
Sent: Wednesday, February 15, 2023 8:06 AM
To: dev@arrow.apache.org
Subject: Re: [DISCUSS] Flight RPC/Flight SQL/ADBC enhancements

Ah, right. I haven't written up the last set of ADBC proposals yet. I'll do
that in the next day or two.

On Tue, Feb 14, 2023, at 17:38, Will Jones wrote:
> Hi David,
>
> The proposals in the Flight/Flight SQL document look excellent. As 
> I've been looking at ADBC I've been wondering about polling / async 
> execution, cancellation, and progress indicators. Glad to see those in 
> the Flight document, but where are they in the ADBC issues? Do they 
> still need to be created?
>
> Best,
>
> Will Jones
>
> On Tue, Feb 14, 2023 at 12:58 PM David Li <lidav...@apache.org> wrote:
>
>> Hello,
>>
>> I would like to submit some Flight RPC and Flight SQL enhancements 
>> for discussion. They cover the following:
>>
>> - Executing 'queries' in a retryable, nonblocking way
>> - Handling ordered result sets
>> - Handling expiration of/re-reading result sets
>>
>> In addition, there are corresponding proposals for ADBC in 
>> anticipation of these features, James's catalogs proposal for Flight 
>> SQL, and other feedback.
>>
>> The Flight proposals are described in this document [1]. It should be 
>> open for comments.
>> The ADBC proposals are filed as individual issues in this milestone [2].
>>
>> Any feedback is much appreciated. There are not yet prototype 
>> implementations, but if there is a rough consensus then I can begin on
that.
>>
>> [1]:
>> https://docs.google.com/document/d/1jhPyPZSOo2iy0LqIJVUs9KWPyFULVFJXT
>> ILDfkadx2g/edit?usp=sharing
>> [2]: https://github.com/apache/arrow-adbc/milestone/3
>>
>> Thanks,
>> David
>>

Reply via email to