Following up here - what are the next steps? The RFC PR looks fairly complete, 
maybe we can help build out implementations in C++/Java/other languages in 
preparation for a vote?

On Wed, Mar 9, 2022, at 00:23, Micah Kornfield wrote:
>>
>> The operation flow would be like this, or what would it look like?
>> Client ---> GetFlightInfo (query/update operation in payload) ---> Server
>> ---> Results (non-streamed)
>
>
> This is roughly the flow I was imagining if the server chooses to send back
> inlined data.
>
> -Micah
>
> On Tue, Mar 8, 2022 at 11:27 AM Gavin Ray <ray.gavi...@gmail.com> wrote:
>
>> Thank you for doing this, left a few questions on the GH issue
>>
>> I would adopt this proposal as soon as it makes it into nightlies
>> (or possibly earlier if it's just a matter of regenerating the proto
>> definitions)
>>
>> The operation flow would be like this, or what would it look like?
>>
>> Client ---> GetFlightInfo (query/update operation in payload) ---> Server
>> ---> Results (non-streamed)
>>
>>
>>
>>
>> On Tue, Mar 8, 2022 at 2:04 PM Micah Kornfield <emkornfi...@gmail.com>
>> wrote:
>>
>>> Some people have already left comments on
>>> https://github.com/apache/arrow/pull/12571  More eyes on it would be
>>> appreciated.  If there aren't more comments, I'll try to start
>>> implementing
>>> this feature in Flight next week, and hopefully have a vote after it is
>>> supported in Java and C++/Python.
>>>
>>>
>>> Thanks,
>>> Micah
>>>
>>> On Fri, Mar 4, 2022 at 10:54 PM Micah Kornfield <emkornfi...@gmail.com>
>>> wrote:
>>>
>>> > I put together straw-man proposal in PR [1] for the Flight changes.
>>> > Ultimately, it seemed based on the use-cases discussed inlining the
>>> data on
>>> > the Ticket made the most sense.  This might be overly complex (I'm not
>>> sure
>>> > how I feel about a enum indicating partial vs full results) but welcome
>>> > feedback.  Once we get consensus on this proposal, I can add changes to
>>> > Flight SQL and try to provide reference implementations.
>>> >
>>> > [1] https://github.com/apache/arrow/pull/12571
>>> >
>>> > On Tue, Mar 1, 2022 at 10:51 PM Micah Kornfield <emkornfi...@gmail.com>
>>> > wrote:
>>> >
>>> >> Would it make sense to make this part of DoGet since it
>>> >>> still would be returning a record batch
>>> >>
>>> >> I would lean against this. I think in many cases the client doesn't
>>> know
>>> >> the size of the data that it expects.  Leaving the flexibility on the
>>> >> server side to send back inlined data when it thinks it makes sense,
>>> or a
>>> >> bunch of tickets when there is in fact a lot of data seems like the
>>> best
>>> >> option here.
>>> >>
>>> >> For cases like previewing data, you usually just want to get a small
>>> >>> amount
>>> >>> of data quickly.
>>> >>
>>> >> This is interesting and might be an additional use case.  If we did
>>> >> decide to extend FlightInfo we might also want a way of annotating
>>> inlined
>>> >> data with its corresponding ticket.  That way even for large results,
>>> you
>>> >> could still send back a small preview if desired.
>>> >>
>>> >> After considering it a little bit I think I'm sold that inlined data
>>> >> should not replace a ticket.  So in my mind the open question is
>>> whether
>>> >> the client needs to actively opt-in to inlined data.  The scenarios I
>>> could
>>> >> come with where inlined data isn't useful are:
>>> >> 1.  The client is an old client and isn't aware inline data might be
>>> >> returned.  In this case the main cost is of extra data on the wire and
>>> >> storing it as unknown fields [1].
>>> >> 2.  The client is a new client but still doesn't want to get inline
>>> data
>>> >> (it might want to distribute all consumption to other processes).  Same
>>> >> cost is paid as option 1.
>>> >>
>>> >> Are there other scenarios?  If servers choose reasonable limits on what
>>> >> data to inline, the extra complexity of negotiating with the client in
>>> this
>>> >> case might not be worth the benefits.
>>> >>
>>> >> Cheers,
>>> >> Micah
>>> >>
>>> >>
>>> >> [1]
>>> https://developers.google.com/protocol-buffers/docs/proto3#unknowns
>>> >>
>>> >> On Tue, Mar 1, 2022 at 10:01 PM Bryan Cutler <cutl...@gmail.com>
>>> wrote:
>>> >>
>>> >>> I think this would be a useful feature and be nice to have in Flight
>>> >>> core.
>>> >>> For cases like previewing data, you usually just want to get a small
>>> >>> amount
>>> >>> of data quickly. Would it make sense to make this part of DoGet since
>>> it
>>> >>> still would be returning a record batch? Perhaps a Ticket could be
>>> made
>>> >>> to
>>> >>> have an optional FlightDescriptor that would serve as an all-in-one
>>> shot?
>>> >>>
>>> >>> On Tue, Mar 1, 2022 at 8:44 AM David Li <lidav...@apache.org> wrote:
>>> >>>
>>> >>> > I agree with something along Antoine's proposal, though: maybe we
>>> >>> should
>>> >>> > be more structured with the flags (akin to what Micah mentioned with
>>> >>> the
>>> >>> > Feature enum).
>>> >>> >
>>> >>> > Also, the flag could be embedded into the Flight SQL messages
>>> instead.
>>> >>> (So
>>> >>> > in effect, Flight would only add the capability to return data with
>>> >>> > FlightInfo, and it's up to applications, like Flight SQL, to decide
>>> how
>>> >>> > they want to take advantage of that.)
>>> >>> >
>>> >>> > I think having a completely separate method and return type and
>>> having
>>> >>> to
>>> >>> > poll for it beforehand somewhat defeats the purpose of having
>>> it/would
>>> >>> be
>>> >>> > much harder of a transition.
>>> >>> >
>>> >>> > Also: it should be `repeated FlightInfo inline_data` right? In case
>>> we
>>> >>> > also need dictionary batches?
>>> >>> >
>>> >>> > On Tue, Mar 1, 2022, at 11:39, Antoine Pitrou wrote:
>>> >>> > > Can we just add the following field to the FlightDescriptor
>>> message:
>>> >>> > >
>>> >>> > >   bool accept_inline_data = 4;
>>> >>> > >
>>> >>> > > and this one to the FlightInfo message:
>>> >>> > >
>>> >>> > >   FlightData inline_data = 100;
>>> >>> > >
>>> >>> > > Then new clients can `accept_inline_data` to true (the default
>>> being
>>> >>> > > false if omitted) to signal servers that they can put the data if
>>> >>> > > `inline_data` if deemed small enough.
>>> >>> > >
>>> >>> > > (the `accept_inline_data` field could also be used to the Criteria
>>> >>> > > message)
>>> >>> > >
>>> >>> > >
>>> >>> > > Alternatively, if the FlightDescriptor expansion looks a bit dirty
>>> >>> > > (FlightDescriptor being used in other contexts where
>>> >>> > > `accept_inline_data` makes no sense), we can instead define a new
>>> >>> > > method:
>>> >>> > >
>>> >>> > >   rpc GetFlightInfoEx(GetFlightInfoRequest) returns (FlightInfo)
>>> {}
>>> >>> > >
>>> >>> > > with:
>>> >>> > >
>>> >>> > > message GetFlightInfoRequest {
>>> >>> > >   FlightDescriptor flight_descriptor = 1;
>>> >>> > >   bool accept_inline_data = 2;
>>> >>> > > }
>>> >>> > >
>>> >>> > > Regards
>>> >>> > >
>>> >>> > > Antoine.
>>> >>> > >
>>> >>> > >
>>> >>> > > On Mon, 28 Feb 2022 11:29:12 -0800
>>> >>> > > James Duong <jam...@bitquilltech.com.INVALID> wrote:
>>> >>> > >> This seems reasonable, however we need to account for existing
>>> >>> Flight
>>> >>> > >> clients that were written before this.
>>> >>> > >>
>>> >>> > >> It seems like the server will need to still handle the ticket
>>> >>> returned
>>> >>> > for
>>> >>> > >> getStream() for clients that are unaware of the small result
>>> >>> > optimization.
>>> >>> > >>
>>> >>> > >> On Mon, Feb 28, 2022 at 11:26 AM David Li <lidav...@apache.org>
>>> >>> wrote:
>>> >>> > >>
>>> >>> > >> > Ah, that makes more sense, that would be a reasonable
>>> extension to
>>> >>> > Flight
>>> >>> > >> > overall. (While we're at it, I think it would help to have an
>>> >>> > app_metadata
>>> >>> > >> > field in FlightInfo as well.)
>>> >>> > >> >
>>> >>> > >> > On Mon, Feb 28, 2022, at 14:24, Micah Kornfield wrote:
>>> >>> > >> > >>
>>> >>> > >> > >> But it seems reasonable to add a one-shot query path using
>>> >>> DoGet.
>>> >>> > >> > >
>>> >>> > >> > >
>>> >>> > >> > > I was thinking more of adding a bytes field to FlightInfo
>>> that
>>> >>> > could
>>> >>> > >> > store
>>> >>> > >> > > arrow data.  That way GetFlightInfo would be the only RPC
>>> >>> necessary
>>> >>> > for
>>> >>> > >> > > small results when executing a CMD.  The client doesn't
>>> >>> necessarily
>>> >>> > know
>>> >>> > >> > > whether a query will return large or small results.
>>> >>> > >> > >
>>> >>> > >> > > On Mon, Feb 28, 2022 at 11:04 AM David Li <
>>> lidav...@apache.org>
>>> >>> > wrote:
>>> >>> > >> > >
>>> >>> > >> > >> I think the focus was on large result sets (though I don't
>>> >>> recall
>>> >>> > this
>>> >>> > >> > >> being discussed before) and supporting multi-node setups
>>> (hence
>>> >>> > >> > >> GetFlightInfo/DoGet are separated). But it seems reasonable
>>> to
>>> >>> add
>>> >>> > a
>>> >>> > >> > >> one-shot query path using DoGet.
>>> >>> > >> > >>
>>> >>> > >> > >> On Mon, Feb 28, 2022, at 13:32, Adam Lippai wrote:
>>> >>> > >> > >> > I saw the same. A small, stateless query ability would be
>>> >>> nice
>>> >>> > >> > >> (connection
>>> >>> > >> > >> > open, initialization, query in one message, the resultset
>>> in
>>> >>> > the
>>> >>> > >> > response
>>> >>> > >> > >> > in one message)
>>> >>> > >> > >> >
>>> >>> > >> > >> > On Mon, Feb 28, 2022, 13:12 Micah Kornfield <
>>> >>> > emkornfi...@gmail.com>
>>> >>> > >> > >> wrote:
>>> >>> > >> > >> >
>>> >>> > >> > >> >> I'm rereviewing the Flight SQL interfaces, and I'm not
>>> sure
>>> >>> if
>>> >>> > I'm
>>> >>> > >> > >> missing
>>> >>> > >> > >> >> it but is there any optimization for small results?  My
>>> >>> concern
>>> >>> > is
>>> >>> > >> > that
>>> >>> > >> > >> the
>>> >>> > >> > >> >> overhead of the RPCs for the DoGet after executing the
>>> query
>>> >>> > could
>>> >>> > >> > add
>>> >>> > >> > >> >> non-trivial latency for smaller results.
>>> >>> > >> > >> >>
>>> >>> > >> > >> >> Has anybody else thought about this/investigated it?  Am
>>> I
>>> >>> > >> > understanding
>>> >>> > >> > >> >> this correctly?
>>> >>> > >> > >> >>
>>> >>> > >> > >> >> Thanks,
>>> >>> > >> > >> >> Micah
>>> >>> > >> > >> >>
>>> >>> > >> > >>
>>> >>> > >> >
>>> >>> > >>
>>> >>> > >>
>>> >>> >
>>> >>>
>>> >>
>>>
>>

Reply via email to