Thanks for writing this up! I added a few general comments, but have a
question on the approach because it's not quite what I was expecting.

I am slightly concerned that the proposal looks more like support for
"multiplexing" IPC streams into a single RPC stream rather than support for
a changing Schema of an otherwise consistently logical stream. gRPC already
does a good job decoupling RPC streams from one another. I feel that
throwing that idea into the IPC stream increases client-library
implementation cost by quite a lot.

Why is it not good enough to replace the Schema when we see a duplicate?
This is undoubtedly less work across all client implementations.

The benefit I see is that you might have two schemas that you swap between
frequently then you can indicate with a single integer. If that's what you
want to support I would rather think of them as `schema_id` instead of
`stream_id` and not give this impression that multiplexing is a goal. As
you have proposed, it seems that the "done writing for a stream" needs a
callback notifying the user receiving the stream that a logical subset of
the flight is complete. Alternatively, if they aren't independent streams
(to the end-user), we could tell the Arrow layer that a particular schema
is no longer needed without also needing to communicate further downstream.

On Wed, Jun 23, 2021 at 1:39 PM David Li <lidav...@apache.org> wrote:

> Ah to be clear, the API is indeed inconsistent - DoExchange was added some
> time later (and by its nature returning a FlightDataStream would not have
> been possible, since it's meant to be able to interleave reading/writing).
> But really, DoGet is indeed the odd one out in the C++ API and it may be
> worth correcting. You could also perhaps imagine making a FlightDataStream
> implementation that accepts a closure and provides it a fake writer, if the
> API mismatch is hard to work with...
>
> That said: this has some benefits, e.g. for a Python service that returns
> a Table, that means data can be fed into gRPC entirely in C++ without
> having to bounce into Python for each chunk.
>
> Best,
> David
>
> On Wed, Jun 23, 2021, at 15:33, Gosh Arzumanyan wrote:
> > Hi David,
> >
> > Got you. In fact I was looking at this more from the point of view of
> consistency of the API in terms of "inputs" and thought DoExchange is kind
> of a DoGet+ so might make sense to have the same classes being utilized in
> both places. But again, I might be missing something and I get the point
> about breaking change.
> >
> > Cheers,
> > Gosh
> >
> > On Wed, Jun 23, 2021 at 2:58 PM David Li <lidav...@apache.org> wrote:
> >> __
> >> It's mostly a quirk of implementation (and just for clarification,
> they're all nearly identical on the format/protocol level).
> >>
> >> DoGet is conceptualized as your application returning a readable stream
> of batches, instead of your application imperatively writing batches to the
> client. (This is different than how Flight is implemented in Java.) You
> would normally not implement FlightDataStream - you would return a
> RecordBatchStream.
> >>
> >> DoGet could not have FlightMessageWriter as a return type as that
> wouldn't make sense, but it could accept an instance of that as a parameter
> instead, much like DoExchange. That would be a breaking change.
> >>
> >> Best,
> >> David
> >>
> >> On Wed, Jun 23, 2021, at 08:47, Gosh Arzumanyan wrote:
> >>> Hi David,
> >>>
> >>> Going through the ArrowFlight API: got confused a bit on DoGet and
> >>> DoPut/DoExachange apis: the former one expects FlightDataStream which
> talks
> >>> in already serialized message terms while the latter to
> >>> accept FlightMessageReader/Writer which expect the user to pass in
> >>> RecordBatches etc. Is there any reason why the DoGet can't have
> >>> FlightMessageWriter as a return type?
> >>>
> >>> Cheers,
> >>> Gosh
> >>>
> >>> On Mon, Jun 21, 2021 at 9:47 PM Gosh Arzumanyan <gosh...@gmail.com>
> wrote:
> >>>
> >>> > Thanks David!
> >>> >
> >>> > I also responded/added more suggestions/questions to the doc. I
> think it
> >>> > makes sense to have two sections: one purely protocol oriented and
> second
> >>> > API oriented(examples in c++ or in any other language should make
> the idea
> >>> > easier to digest).
> >>> >
> >>> > Thanks for the reference too!
> >>> >
> >>> > Cheers,
> >>> > Gosh
> >>> >
> >>> > On Mon, Jun 21, 2021 at 4:41 PM David Li <lidav...@apache.org>
> wrote:
> >>> >
> >>> >> Thanks! I've left some initial comments/suggestions to expand it in
> terms
> >>> >> of the format definitions and not the C++ APIs.
> >>> >>
> >>> >> I'll also note something like this was proposed a long time ago -
> there's
> >>> >> not very much discussion about it there but for reference:
> >>> >>
> https://lists.apache.org/thread.html/0e5ba78c48cdd0e357f3a4a6d8affd31767c34376b62c001910823af%40%3Cdev.arrow.apache.org%3E
> >>> >> (or see the thread '[Discuss][FlightRPC] Extensions to Flight:
> >>> >> "DoBidirectional"' from 2019-2020). It might be good to address why
> the
> >>> >> proposed workaround there (union-of-structs) is insufficient for
> the use
> >>> >> cases here (and in FlightSQL).
> >>> >>
> >>> >> -David
> >>> >>
> >>> >> On Mon, Jun 21, 2021, at 08:22, Gosh Arzumanyan wrote:
> >>> >> > Ah sorry, comments should work now.
> >>> >> >
> >>> >> > Cheers,
> >>> >> > Gosh
> >>> >> >
> >>> >> > On Mon., 21 Jun. 2021, 14:18 David Li, <lidav...@apache.org
> <mailto:
> >>> >> lidavidm%40apache.org>> wrote:
> >>> >> >
> >>> >> > > Thanks! Will give it a look.
> >>> >> > >
> >>> >> > > Would you mind opening it up for comments?
> >>> >> > >
> >>> >> > > -David
> >>> >> > >
> >>> >> > > On 2021/06/21 11:56:24, Gosh Arzumanyan <gosh...@gmail.com
> <mailto:
> >>> >> gosharz%40gmail.com>> wrote:
> >>> >> > > > Hi folks,
> >>> >> > > >
> >>> >> > > > Started putting some thoughts together here:
> >>> >> > > >
> >>> >> > >
> >>> >>
> https://docs.google.com/document/d/1dIOpKNYwsd9sdChsRBAx37BiJXl_7enpwWkH76n1tOI/edit?usp=sharing
> >>> >> > > > Any feedback is welcome!
> >>> >> > > >
> >>> >> > > > Cheers,
> >>> >> > > > Gosh
> >>> >> > > >
> >>> >> > >
> >>> >> >
> >>> >>
> >>> >
> >>>
> >>
>


--

Reply via email to