Thank you for doing this, left a few questions on the GH issue

I would adopt this proposal as soon as it makes it into nightlies
(or possibly earlier if it's just a matter of regenerating the proto
definitions)

The operation flow would be like this, or what would it look like?

Client ---> GetFlightInfo (query/update operation in payload) ---> Server
---> Results (non-streamed)




On Tue, Mar 8, 2022 at 2:04 PM Micah Kornfield <emkornfi...@gmail.com>
wrote:

> Some people have already left comments on
> https://github.com/apache/arrow/pull/12571  More eyes on it would be
> appreciated.  If there aren't more comments, I'll try to start implementing
> this feature in Flight next week, and hopefully have a vote after it is
> supported in Java and C++/Python.
>
>
> Thanks,
> Micah
>
> On Fri, Mar 4, 2022 at 10:54 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
> > I put together straw-man proposal in PR [1] for the Flight changes.
> > Ultimately, it seemed based on the use-cases discussed inlining the data
> on
> > the Ticket made the most sense.  This might be overly complex (I'm not
> sure
> > how I feel about a enum indicating partial vs full results) but welcome
> > feedback.  Once we get consensus on this proposal, I can add changes to
> > Flight SQL and try to provide reference implementations.
> >
> > [1] https://github.com/apache/arrow/pull/12571
> >
> > On Tue, Mar 1, 2022 at 10:51 PM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> >
> >> Would it make sense to make this part of DoGet since it
> >>> still would be returning a record batch
> >>
> >> I would lean against this. I think in many cases the client doesn't know
> >> the size of the data that it expects.  Leaving the flexibility on the
> >> server side to send back inlined data when it thinks it makes sense, or
> a
> >> bunch of tickets when there is in fact a lot of data seems like the best
> >> option here.
> >>
> >> For cases like previewing data, you usually just want to get a small
> >>> amount
> >>> of data quickly.
> >>
> >> This is interesting and might be an additional use case.  If we did
> >> decide to extend FlightInfo we might also want a way of annotating
> inlined
> >> data with its corresponding ticket.  That way even for large results,
> you
> >> could still send back a small preview if desired.
> >>
> >> After considering it a little bit I think I'm sold that inlined data
> >> should not replace a ticket.  So in my mind the open question is whether
> >> the client needs to actively opt-in to inlined data.  The scenarios I
> could
> >> come with where inlined data isn't useful are:
> >> 1.  The client is an old client and isn't aware inline data might be
> >> returned.  In this case the main cost is of extra data on the wire and
> >> storing it as unknown fields [1].
> >> 2.  The client is a new client but still doesn't want to get inline data
> >> (it might want to distribute all consumption to other processes).  Same
> >> cost is paid as option 1.
> >>
> >> Are there other scenarios?  If servers choose reasonable limits on what
> >> data to inline, the extra complexity of negotiating with the client in
> this
> >> case might not be worth the benefits.
> >>
> >> Cheers,
> >> Micah
> >>
> >>
> >> [1] https://developers.google.com/protocol-buffers/docs/proto3#unknowns
> >>
> >> On Tue, Mar 1, 2022 at 10:01 PM Bryan Cutler <cutl...@gmail.com> wrote:
> >>
> >>> I think this would be a useful feature and be nice to have in Flight
> >>> core.
> >>> For cases like previewing data, you usually just want to get a small
> >>> amount
> >>> of data quickly. Would it make sense to make this part of DoGet since
> it
> >>> still would be returning a record batch? Perhaps a Ticket could be made
> >>> to
> >>> have an optional FlightDescriptor that would serve as an all-in-one
> shot?
> >>>
> >>> On Tue, Mar 1, 2022 at 8:44 AM David Li <lidav...@apache.org> wrote:
> >>>
> >>> > I agree with something along Antoine's proposal, though: maybe we
> >>> should
> >>> > be more structured with the flags (akin to what Micah mentioned with
> >>> the
> >>> > Feature enum).
> >>> >
> >>> > Also, the flag could be embedded into the Flight SQL messages
> instead.
> >>> (So
> >>> > in effect, Flight would only add the capability to return data with
> >>> > FlightInfo, and it's up to applications, like Flight SQL, to decide
> how
> >>> > they want to take advantage of that.)
> >>> >
> >>> > I think having a completely separate method and return type and
> having
> >>> to
> >>> > poll for it beforehand somewhat defeats the purpose of having
> it/would
> >>> be
> >>> > much harder of a transition.
> >>> >
> >>> > Also: it should be `repeated FlightInfo inline_data` right? In case
> we
> >>> > also need dictionary batches?
> >>> >
> >>> > On Tue, Mar 1, 2022, at 11:39, Antoine Pitrou wrote:
> >>> > > Can we just add the following field to the FlightDescriptor
> message:
> >>> > >
> >>> > >   bool accept_inline_data = 4;
> >>> > >
> >>> > > and this one to the FlightInfo message:
> >>> > >
> >>> > >   FlightData inline_data = 100;
> >>> > >
> >>> > > Then new clients can `accept_inline_data` to true (the default
> being
> >>> > > false if omitted) to signal servers that they can put the data if
> >>> > > `inline_data` if deemed small enough.
> >>> > >
> >>> > > (the `accept_inline_data` field could also be used to the Criteria
> >>> > > message)
> >>> > >
> >>> > >
> >>> > > Alternatively, if the FlightDescriptor expansion looks a bit dirty
> >>> > > (FlightDescriptor being used in other contexts where
> >>> > > `accept_inline_data` makes no sense), we can instead define a new
> >>> > > method:
> >>> > >
> >>> > >   rpc GetFlightInfoEx(GetFlightInfoRequest) returns (FlightInfo) {}
> >>> > >
> >>> > > with:
> >>> > >
> >>> > > message GetFlightInfoRequest {
> >>> > >   FlightDescriptor flight_descriptor = 1;
> >>> > >   bool accept_inline_data = 2;
> >>> > > }
> >>> > >
> >>> > > Regards
> >>> > >
> >>> > > Antoine.
> >>> > >
> >>> > >
> >>> > > On Mon, 28 Feb 2022 11:29:12 -0800
> >>> > > James Duong <jam...@bitquilltech.com.INVALID> wrote:
> >>> > >> This seems reasonable, however we need to account for existing
> >>> Flight
> >>> > >> clients that were written before this.
> >>> > >>
> >>> > >> It seems like the server will need to still handle the ticket
> >>> returned
> >>> > for
> >>> > >> getStream() for clients that are unaware of the small result
> >>> > optimization.
> >>> > >>
> >>> > >> On Mon, Feb 28, 2022 at 11:26 AM David Li <lidav...@apache.org>
> >>> wrote:
> >>> > >>
> >>> > >> > Ah, that makes more sense, that would be a reasonable extension
> to
> >>> > Flight
> >>> > >> > overall. (While we're at it, I think it would help to have an
> >>> > app_metadata
> >>> > >> > field in FlightInfo as well.)
> >>> > >> >
> >>> > >> > On Mon, Feb 28, 2022, at 14:24, Micah Kornfield wrote:
> >>> > >> > >>
> >>> > >> > >> But it seems reasonable to add a one-shot query path using
> >>> DoGet.
> >>> > >> > >
> >>> > >> > >
> >>> > >> > > I was thinking more of adding a bytes field to FlightInfo that
> >>> > could
> >>> > >> > store
> >>> > >> > > arrow data.  That way GetFlightInfo would be the only RPC
> >>> necessary
> >>> > for
> >>> > >> > > small results when executing a CMD.  The client doesn't
> >>> necessarily
> >>> > know
> >>> > >> > > whether a query will return large or small results.
> >>> > >> > >
> >>> > >> > > On Mon, Feb 28, 2022 at 11:04 AM David Li <
> lidav...@apache.org>
> >>> > wrote:
> >>> > >> > >
> >>> > >> > >> I think the focus was on large result sets (though I don't
> >>> recall
> >>> > this
> >>> > >> > >> being discussed before) and supporting multi-node setups
> (hence
> >>> > >> > >> GetFlightInfo/DoGet are separated). But it seems reasonable
> to
> >>> add
> >>> > a
> >>> > >> > >> one-shot query path using DoGet.
> >>> > >> > >>
> >>> > >> > >> On Mon, Feb 28, 2022, at 13:32, Adam Lippai wrote:
> >>> > >> > >> > I saw the same. A small, stateless query ability would be
> >>> nice
> >>> > >> > >> (connection
> >>> > >> > >> > open, initialization, query in one message, the resultset
> in
> >>> > the
> >>> > >> > response
> >>> > >> > >> > in one message)
> >>> > >> > >> >
> >>> > >> > >> > On Mon, Feb 28, 2022, 13:12 Micah Kornfield <
> >>> > emkornfi...@gmail.com>
> >>> > >> > >> wrote:
> >>> > >> > >> >
> >>> > >> > >> >> I'm rereviewing the Flight SQL interfaces, and I'm not
> sure
> >>> if
> >>> > I'm
> >>> > >> > >> missing
> >>> > >> > >> >> it but is there any optimization for small results?  My
> >>> concern
> >>> > is
> >>> > >> > that
> >>> > >> > >> the
> >>> > >> > >> >> overhead of the RPCs for the DoGet after executing the
> query
> >>> > could
> >>> > >> > add
> >>> > >> > >> >> non-trivial latency for smaller results.
> >>> > >> > >> >>
> >>> > >> > >> >> Has anybody else thought about this/investigated it?  Am I
> >>> > >> > understanding
> >>> > >> > >> >> this correctly?
> >>> > >> > >> >>
> >>> > >> > >> >> Thanks,
> >>> > >> > >> >> Micah
> >>> > >> > >> >>
> >>> > >> > >>
> >>> > >> >
> >>> > >>
> >>> > >>
> >>> >
> >>>
> >>
>

Reply via email to