On Tue, Feb 12, 2019 at 2:48 PM Antoine Pitrou <anto...@python.org> wrote:
>
>
> Hi David,
>
> I think allowing to send application-specific ancillary data in addition
> to Arrow data makes sense.
>
> (I'm also wondering whether the choice of gRPC is appropriate at all -
> the current C++ hacks around "zero-copy" are not pretty and they may not
> translate to other languages either)
>

This is unrelated to the discussion of extending the Flight protocol,
but I'm not sure I would describe the serialization optimizations that
have been implemented as "hacks". gRPC exposes its message
serialization layer among other things to permit extensibility and to
not require the use of Protocol Buffers necessarily.

The reason that we chose to use the Protobuf wire format for all
message types, including data, is that there is excellent
cross-language support for protobufs, and among production-ready RPC
frameworks, gRPC has the most robust language support, covering pretty
much all the languages we care about:
https://github.com/grpc/grpc#to-start-using-grpc. The only one missing
is Rust, and I reckon that will get rectified at some point (there is
already https://github.com/stepancheg/grpc-rust, maybe it will be
adopted into gRPC formally at some point). But to have C++, C#, Go,
Java, and Node officially supported out of the box is not nothing. I
think it would be unwise to go a different way unless you have some
compelling reason that gRPC / HTTP/2 is fundamentally flawed this this
intended use.

For the FlightData message in particular, if a particular Flight
client is unconcerned with memory optimizations, they can not bother
with it and simply leave the serialization to their Protocol Buffers
implementation. This also means that Arrow-agnostic gRPC clients can
interact with Flight services using only the Flight.proto and some
knowledge about what commands the server provides.

In speaking with others parties about Flight, there is some interest
in supporting different underlying data movement schemes while
preserving the gRPC command layer, e.g. optimizing for high-bandwidth
networking like infiniband.

- Wes

> Regards
>
> Antoine.
>
>
> Le 12/02/2019 à 21:44, David Ming Li a écrit :
> > Hi all,
> >
> >
> >
> > We've been evaluating Flight for our use, and we're wondering if the 
> > protocol is still open to extensions, as having a few application-defined 
> > metadata fields would help our use cases a lot.
> >
> >
> >
> > (Apologies if this is a repost - was having issue with the spam filter.)
> >
> >
> >
> > Specifically, in DoGet, having a metadata binary blob in the server->client 
> > messages would help implement resumable requests, especially as we have 
> > non-monotonically-indexed data streams. This would also help us reuse 
> > server-side state if we do have to resume a stream.
> >
> >
> >
> > In DoPut, we think making this call bidirectional would be useful to 
> > support application-level ACKs, again to implement resumable uploads. The 
> > server would thus have the option to send back an application-defined 
> > binary blob at any point during an upload. This is less important, as you 
> > could imagine starting a plain gRPC server-streaming call alongside the 
> > Flight DoPut call to do the same. But as you can't bind a gRPC and Flight 
> > service on the same port/channel, this is somewhat inconvenient.
> >
> >
> >
> > That leads me to the API-level niggles we have; it would be nice to be able 
> > to bind gRPC services alongside a Flight service, and conversely be able to 
> > reuse a gRPC channel across gRPC and Flight clients, though breaking the 
> > hiding of gRPC isn't desirable.
> >
> >
> >
> > Meanwhile, it would be nice to wrap the gRPC server 'awaitTermination' 
> > methods, so that we don't have to busy-wait ourselves (as in Java) or have 
> > the option to not busy-wait taken away from us (as in C++). In particular, 
> > when investigating Python bindings to C++ [0], the fact that 
> > FlightServerBase::Run also calls grpc::Server::Wait for you means that 
> > Ctrl-C no longer works in Python.
> >
> >
> >
> > Does what we're trying to accomplish make sense? Are there better ways to 
> > achieve resumable uploads/downloads in the current protocol?
> >
> >
> >
> > [0]: https://github.com/apache/arrow/pull/3566
> >
> >
> >
> > Thanks,
> >
> > David
> >
> >

Reply via email to