Re: Flight/FlightSQL Optimization for Small Results?

Micah Kornfield Tue, 01 Mar 2022 22:51:58 -0800

>
> Would it make sense to make this part of DoGet since it
> still would be returning a record batch


I would lean against this. I think in many cases the client doesn't know
the size of the data that it expects.  Leaving the flexibility on the
server side to send back inlined data when it thinks it makes sense, or a
bunch of tickets when there is in fact a lot of data seems like the best
option here.

For cases like previewing data, you usually just want to get a small amount
> of data quickly.

This is interesting and might be an additional use case.  If we did decide
to extend FlightInfo we might also want a way of annotating inlined data
with its corresponding ticket.  That way even for large results, you could
still send back a small preview if desired.

After considering it a little bit I think I'm sold that inlined data should
not replace a ticket.  So in my mind the open question is whether the
client needs to actively opt-in to inlined data.  The scenarios I could
come with where inlined data isn't useful are:
1.  The client is an old client and isn't aware inline data might be
returned.  In this case the main cost is of extra data on the wire and
storing it as unknown fields [1].
2.  The client is a new client but still doesn't want to get inline data
(it might want to distribute all consumption to other processes).  Same
cost is paid as option 1.

Are there other scenarios?  If servers choose reasonable limits on what
data to inline, the extra complexity of negotiating with the client in this
case might not be worth the benefits.

Cheers,
Micah


[1] https://developers.google.com/protocol-buffers/docs/proto3#unknowns

On Tue, Mar 1, 2022 at 10:01 PM Bryan Cutler <cutl...@gmail.com> wrote:

> I think this would be a useful feature and be nice to have in Flight core.
> For cases like previewing data, you usually just want to get a small amount
> of data quickly. Would it make sense to make this part of DoGet since it
> still would be returning a record batch? Perhaps a Ticket could be made to
> have an optional FlightDescriptor that would serve as an all-in-one shot?
>
> On Tue, Mar 1, 2022 at 8:44 AM David Li <lidav...@apache.org> wrote:
>
> > I agree with something along Antoine's proposal, though: maybe we should
> > be more structured with the flags (akin to what Micah mentioned with the
> > Feature enum).
> >
> > Also, the flag could be embedded into the Flight SQL messages instead.
> (So
> > in effect, Flight would only add the capability to return data with
> > FlightInfo, and it's up to applications, like Flight SQL, to decide how
> > they want to take advantage of that.)
> >
> > I think having a completely separate method and return type and having to
> > poll for it beforehand somewhat defeats the purpose of having it/would be
> > much harder of a transition.
> >
> > Also: it should be `repeated FlightInfo inline_data` right? In case we
> > also need dictionary batches?
> >
> > On Tue, Mar 1, 2022, at 11:39, Antoine Pitrou wrote:
> > > Can we just add the following field to the FlightDescriptor message:
> > >
> > >   bool accept_inline_data = 4;
> > >
> > > and this one to the FlightInfo message:
> > >
> > >   FlightData inline_data = 100;
> > >
> > > Then new clients can `accept_inline_data` to true (the default being
> > > false if omitted) to signal servers that they can put the data if
> > > `inline_data` if deemed small enough.
> > >
> > > (the `accept_inline_data` field could also be used to the Criteria
> > > message)
> > >
> > >
> > > Alternatively, if the FlightDescriptor expansion looks a bit dirty
> > > (FlightDescriptor being used in other contexts where
> > > `accept_inline_data` makes no sense), we can instead define a new
> > > method:
> > >
> > >   rpc GetFlightInfoEx(GetFlightInfoRequest) returns (FlightInfo) {}
> > >
> > > with:
> > >
> > > message GetFlightInfoRequest {
> > >   FlightDescriptor flight_descriptor = 1;
> > >   bool accept_inline_data = 2;
> > > }
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > On Mon, 28 Feb 2022 11:29:12 -0800
> > > James Duong <jam...@bitquilltech.com.INVALID> wrote:
> > >> This seems reasonable, however we need to account for existing Flight
> > >> clients that were written before this.
> > >>
> > >> It seems like the server will need to still handle the ticket returned
> > for
> > >> getStream() for clients that are unaware of the small result
> > optimization.
> > >>
> > >> On Mon, Feb 28, 2022 at 11:26 AM David Li <lidav...@apache.org>
> wrote:
> > >>
> > >> > Ah, that makes more sense, that would be a reasonable extension to
> > Flight
> > >> > overall. (While we're at it, I think it would help to have an
> > app_metadata
> > >> > field in FlightInfo as well.)
> > >> >
> > >> > On Mon, Feb 28, 2022, at 14:24, Micah Kornfield wrote:
> > >> > >>
> > >> > >> But it seems reasonable to add a one-shot query path using DoGet.
> > >> > >
> > >> > >
> > >> > > I was thinking more of adding a bytes field to FlightInfo that
> > could
> > >> > store
> > >> > > arrow data.  That way GetFlightInfo would be the only RPC
> necessary
> > for
> > >> > > small results when executing a CMD.  The client doesn't
> necessarily
> > know
> > >> > > whether a query will return large or small results.
> > >> > >
> > >> > > On Mon, Feb 28, 2022 at 11:04 AM David Li <lidav...@apache.org>
> > wrote:
> > >> > >
> > >> > >> I think the focus was on large result sets (though I don't recall
> > this
> > >> > >> being discussed before) and supporting multi-node setups (hence
> > >> > >> GetFlightInfo/DoGet are separated). But it seems reasonable to
> add
> > a
> > >> > >> one-shot query path using DoGet.
> > >> > >>
> > >> > >> On Mon, Feb 28, 2022, at 13:32, Adam Lippai wrote:
> > >> > >> > I saw the same. A small, stateless query ability would be nice
> > >> > >> (connection
> > >> > >> > open, initialization, query in one message, the resultset in
> > the
> > >> > response
> > >> > >> > in one message)
> > >> > >> >
> > >> > >> > On Mon, Feb 28, 2022, 13:12 Micah Kornfield <
> > emkornfi...@gmail.com>
> > >> > >> wrote:
> > >> > >> >
> > >> > >> >> I'm rereviewing the Flight SQL interfaces, and I'm not sure if
> > I'm
> > >> > >> missing
> > >> > >> >> it but is there any optimization for small results?  My
> concern
> > is
> > >> > that
> > >> > >> the
> > >> > >> >> overhead of the RPCs for the DoGet after executing the query
> > could
> > >> > add
> > >> > >> >> non-trivial latency for smaller results.
> > >> > >> >>
> > >> > >> >> Has anybody else thought about this/investigated it?  Am I
> > >> > understanding
> > >> > >> >> this correctly?
> > >> > >> >>
> > >> > >> >> Thanks,
> > >> > >> >> Micah
> > >> > >> >>
> > >> > >>
> > >> >
> > >>
> > >>
> >
>

Re: Flight/FlightSQL Optimization for Small Results?

Reply via email to