Yes, next step is implementation which I've been delayed on.  I hope to
have a little time this week to work on it and will post an update.

On Wed, Apr 27, 2022 at 7:48 AM David Li <lidav...@apache.org> wrote:

> Following up here - what are the next steps? The RFC PR looks fairly
> complete, maybe we can help build out implementations in C++/Java/other
> languages in preparation for a vote?
>
> On Wed, Mar 9, 2022, at 00:23, Micah Kornfield wrote:
> >>
> >> The operation flow would be like this, or what would it look like?
> >> Client ---> GetFlightInfo (query/update operation in payload) --->
> Server
> >> ---> Results (non-streamed)
> >
> >
> > This is roughly the flow I was imagining if the server chooses to send
> back
> > inlined data.
> >
> > -Micah
> >
> > On Tue, Mar 8, 2022 at 11:27 AM Gavin Ray <ray.gavi...@gmail.com> wrote:
> >
> >> Thank you for doing this, left a few questions on the GH issue
> >>
> >> I would adopt this proposal as soon as it makes it into nightlies
> >> (or possibly earlier if it's just a matter of regenerating the proto
> >> definitions)
> >>
> >> The operation flow would be like this, or what would it look like?
> >>
> >> Client ---> GetFlightInfo (query/update operation in payload) --->
> Server
> >> ---> Results (non-streamed)
> >>
> >>
> >>
> >>
> >> On Tue, Mar 8, 2022 at 2:04 PM Micah Kornfield <emkornfi...@gmail.com>
> >> wrote:
> >>
> >>> Some people have already left comments on
> >>> https://github.com/apache/arrow/pull/12571  More eyes on it would be
> >>> appreciated.  If there aren't more comments, I'll try to start
> >>> implementing
> >>> this feature in Flight next week, and hopefully have a vote after it is
> >>> supported in Java and C++/Python.
> >>>
> >>>
> >>> Thanks,
> >>> Micah
> >>>
> >>> On Fri, Mar 4, 2022 at 10:54 PM Micah Kornfield <emkornfi...@gmail.com
> >
> >>> wrote:
> >>>
> >>> > I put together straw-man proposal in PR [1] for the Flight changes.
> >>> > Ultimately, it seemed based on the use-cases discussed inlining the
> >>> data on
> >>> > the Ticket made the most sense.  This might be overly complex (I'm
> not
> >>> sure
> >>> > how I feel about a enum indicating partial vs full results) but
> welcome
> >>> > feedback.  Once we get consensus on this proposal, I can add changes
> to
> >>> > Flight SQL and try to provide reference implementations.
> >>> >
> >>> > [1] https://github.com/apache/arrow/pull/12571
> >>> >
> >>> > On Tue, Mar 1, 2022 at 10:51 PM Micah Kornfield <
> emkornfi...@gmail.com>
> >>> > wrote:
> >>> >
> >>> >> Would it make sense to make this part of DoGet since it
> >>> >>> still would be returning a record batch
> >>> >>
> >>> >> I would lean against this. I think in many cases the client doesn't
> >>> know
> >>> >> the size of the data that it expects.  Leaving the flexibility on
> the
> >>> >> server side to send back inlined data when it thinks it makes sense,
> >>> or a
> >>> >> bunch of tickets when there is in fact a lot of data seems like the
> >>> best
> >>> >> option here.
> >>> >>
> >>> >> For cases like previewing data, you usually just want to get a small
> >>> >>> amount
> >>> >>> of data quickly.
> >>> >>
> >>> >> This is interesting and might be an additional use case.  If we did
> >>> >> decide to extend FlightInfo we might also want a way of annotating
> >>> inlined
> >>> >> data with its corresponding ticket.  That way even for large
> results,
> >>> you
> >>> >> could still send back a small preview if desired.
> >>> >>
> >>> >> After considering it a little bit I think I'm sold that inlined data
> >>> >> should not replace a ticket.  So in my mind the open question is
> >>> whether
> >>> >> the client needs to actively opt-in to inlined data.  The scenarios
> I
> >>> could
> >>> >> come with where inlined data isn't useful are:
> >>> >> 1.  The client is an old client and isn't aware inline data might be
> >>> >> returned.  In this case the main cost is of extra data on the wire
> and
> >>> >> storing it as unknown fields [1].
> >>> >> 2.  The client is a new client but still doesn't want to get inline
> >>> data
> >>> >> (it might want to distribute all consumption to other processes).
> Same
> >>> >> cost is paid as option 1.
> >>> >>
> >>> >> Are there other scenarios?  If servers choose reasonable limits on
> what
> >>> >> data to inline, the extra complexity of negotiating with the client
> in
> >>> this
> >>> >> case might not be worth the benefits.
> >>> >>
> >>> >> Cheers,
> >>> >> Micah
> >>> >>
> >>> >>
> >>> >> [1]
> >>> https://developers.google.com/protocol-buffers/docs/proto3#unknowns
> >>> >>
> >>> >> On Tue, Mar 1, 2022 at 10:01 PM Bryan Cutler <cutl...@gmail.com>
> >>> wrote:
> >>> >>
> >>> >>> I think this would be a useful feature and be nice to have in
> Flight
> >>> >>> core.
> >>> >>> For cases like previewing data, you usually just want to get a
> small
> >>> >>> amount
> >>> >>> of data quickly. Would it make sense to make this part of DoGet
> since
> >>> it
> >>> >>> still would be returning a record batch? Perhaps a Ticket could be
> >>> made
> >>> >>> to
> >>> >>> have an optional FlightDescriptor that would serve as an all-in-one
> >>> shot?
> >>> >>>
> >>> >>> On Tue, Mar 1, 2022 at 8:44 AM David Li <lidav...@apache.org>
> wrote:
> >>> >>>
> >>> >>> > I agree with something along Antoine's proposal, though: maybe we
> >>> >>> should
> >>> >>> > be more structured with the flags (akin to what Micah mentioned
> with
> >>> >>> the
> >>> >>> > Feature enum).
> >>> >>> >
> >>> >>> > Also, the flag could be embedded into the Flight SQL messages
> >>> instead.
> >>> >>> (So
> >>> >>> > in effect, Flight would only add the capability to return data
> with
> >>> >>> > FlightInfo, and it's up to applications, like Flight SQL, to
> decide
> >>> how
> >>> >>> > they want to take advantage of that.)
> >>> >>> >
> >>> >>> > I think having a completely separate method and return type and
> >>> having
> >>> >>> to
> >>> >>> > poll for it beforehand somewhat defeats the purpose of having
> >>> it/would
> >>> >>> be
> >>> >>> > much harder of a transition.
> >>> >>> >
> >>> >>> > Also: it should be `repeated FlightInfo inline_data` right? In
> case
> >>> we
> >>> >>> > also need dictionary batches?
> >>> >>> >
> >>> >>> > On Tue, Mar 1, 2022, at 11:39, Antoine Pitrou wrote:
> >>> >>> > > Can we just add the following field to the FlightDescriptor
> >>> message:
> >>> >>> > >
> >>> >>> > >   bool accept_inline_data = 4;
> >>> >>> > >
> >>> >>> > > and this one to the FlightInfo message:
> >>> >>> > >
> >>> >>> > >   FlightData inline_data = 100;
> >>> >>> > >
> >>> >>> > > Then new clients can `accept_inline_data` to true (the default
> >>> being
> >>> >>> > > false if omitted) to signal servers that they can put the data
> if
> >>> >>> > > `inline_data` if deemed small enough.
> >>> >>> > >
> >>> >>> > > (the `accept_inline_data` field could also be used to the
> Criteria
> >>> >>> > > message)
> >>> >>> > >
> >>> >>> > >
> >>> >>> > > Alternatively, if the FlightDescriptor expansion looks a bit
> dirty
> >>> >>> > > (FlightDescriptor being used in other contexts where
> >>> >>> > > `accept_inline_data` makes no sense), we can instead define a
> new
> >>> >>> > > method:
> >>> >>> > >
> >>> >>> > >   rpc GetFlightInfoEx(GetFlightInfoRequest) returns
> (FlightInfo)
> >>> {}
> >>> >>> > >
> >>> >>> > > with:
> >>> >>> > >
> >>> >>> > > message GetFlightInfoRequest {
> >>> >>> > >   FlightDescriptor flight_descriptor = 1;
> >>> >>> > >   bool accept_inline_data = 2;
> >>> >>> > > }
> >>> >>> > >
> >>> >>> > > Regards
> >>> >>> > >
> >>> >>> > > Antoine.
> >>> >>> > >
> >>> >>> > >
> >>> >>> > > On Mon, 28 Feb 2022 11:29:12 -0800
> >>> >>> > > James Duong <jam...@bitquilltech.com.INVALID> wrote:
> >>> >>> > >> This seems reasonable, however we need to account for existing
> >>> >>> Flight
> >>> >>> > >> clients that were written before this.
> >>> >>> > >>
> >>> >>> > >> It seems like the server will need to still handle the ticket
> >>> >>> returned
> >>> >>> > for
> >>> >>> > >> getStream() for clients that are unaware of the small result
> >>> >>> > optimization.
> >>> >>> > >>
> >>> >>> > >> On Mon, Feb 28, 2022 at 11:26 AM David Li <
> lidav...@apache.org>
> >>> >>> wrote:
> >>> >>> > >>
> >>> >>> > >> > Ah, that makes more sense, that would be a reasonable
> >>> extension to
> >>> >>> > Flight
> >>> >>> > >> > overall. (While we're at it, I think it would help to have
> an
> >>> >>> > app_metadata
> >>> >>> > >> > field in FlightInfo as well.)
> >>> >>> > >> >
> >>> >>> > >> > On Mon, Feb 28, 2022, at 14:24, Micah Kornfield wrote:
> >>> >>> > >> > >>
> >>> >>> > >> > >> But it seems reasonable to add a one-shot query path
> using
> >>> >>> DoGet.
> >>> >>> > >> > >
> >>> >>> > >> > >
> >>> >>> > >> > > I was thinking more of adding a bytes field to FlightInfo
> >>> that
> >>> >>> > could
> >>> >>> > >> > store
> >>> >>> > >> > > arrow data.  That way GetFlightInfo would be the only RPC
> >>> >>> necessary
> >>> >>> > for
> >>> >>> > >> > > small results when executing a CMD.  The client doesn't
> >>> >>> necessarily
> >>> >>> > know
> >>> >>> > >> > > whether a query will return large or small results.
> >>> >>> > >> > >
> >>> >>> > >> > > On Mon, Feb 28, 2022 at 11:04 AM David Li <
> >>> lidav...@apache.org>
> >>> >>> > wrote:
> >>> >>> > >> > >
> >>> >>> > >> > >> I think the focus was on large result sets (though I
> don't
> >>> >>> recall
> >>> >>> > this
> >>> >>> > >> > >> being discussed before) and supporting multi-node setups
> >>> (hence
> >>> >>> > >> > >> GetFlightInfo/DoGet are separated). But it seems
> reasonable
> >>> to
> >>> >>> add
> >>> >>> > a
> >>> >>> > >> > >> one-shot query path using DoGet.
> >>> >>> > >> > >>
> >>> >>> > >> > >> On Mon, Feb 28, 2022, at 13:32, Adam Lippai wrote:
> >>> >>> > >> > >> > I saw the same. A small, stateless query ability would
> be
> >>> >>> nice
> >>> >>> > >> > >> (connection
> >>> >>> > >> > >> > open, initialization, query in one message, the
> resultset
> >>> in
> >>> >>> > the
> >>> >>> > >> > response
> >>> >>> > >> > >> > in one message)
> >>> >>> > >> > >> >
> >>> >>> > >> > >> > On Mon, Feb 28, 2022, 13:12 Micah Kornfield <
> >>> >>> > emkornfi...@gmail.com>
> >>> >>> > >> > >> wrote:
> >>> >>> > >> > >> >
> >>> >>> > >> > >> >> I'm rereviewing the Flight SQL interfaces, and I'm not
> >>> sure
> >>> >>> if
> >>> >>> > I'm
> >>> >>> > >> > >> missing
> >>> >>> > >> > >> >> it but is there any optimization for small results?
> My
> >>> >>> concern
> >>> >>> > is
> >>> >>> > >> > that
> >>> >>> > >> > >> the
> >>> >>> > >> > >> >> overhead of the RPCs for the DoGet after executing the
> >>> query
> >>> >>> > could
> >>> >>> > >> > add
> >>> >>> > >> > >> >> non-trivial latency for smaller results.
> >>> >>> > >> > >> >>
> >>> >>> > >> > >> >> Has anybody else thought about this/investigated it?
> Am
> >>> I
> >>> >>> > >> > understanding
> >>> >>> > >> > >> >> this correctly?
> >>> >>> > >> > >> >>
> >>> >>> > >> > >> >> Thanks,
> >>> >>> > >> > >> >> Micah
> >>> >>> > >> > >> >>
> >>> >>> > >> > >>
> >>> >>> > >> >
> >>> >>> > >>
> >>> >>> > >>
> >>> >>> >
> >>> >>>
> >>> >>
> >>>
> >>
>

Reply via email to