On Tue, Feb 12, 2019 at 2:48 PM Antoine Pitrou <anto...@python.org> wrote: > > > Hi David, > > I think allowing to send application-specific ancillary data in addition > to Arrow data makes sense. > > (I'm also wondering whether the choice of gRPC is appropriate at all - > the current C++ hacks around "zero-copy" are not pretty and they may not > translate to other languages either) >
This is unrelated to the discussion of extending the Flight protocol, but I'm not sure I would describe the serialization optimizations that have been implemented as "hacks". gRPC exposes its message serialization layer among other things to permit extensibility and to not require the use of Protocol Buffers necessarily. The reason that we chose to use the Protobuf wire format for all message types, including data, is that there is excellent cross-language support for protobufs, and among production-ready RPC frameworks, gRPC has the most robust language support, covering pretty much all the languages we care about: https://github.com/grpc/grpc#to-start-using-grpc. The only one missing is Rust, and I reckon that will get rectified at some point (there is already https://github.com/stepancheg/grpc-rust, maybe it will be adopted into gRPC formally at some point). But to have C++, C#, Go, Java, and Node officially supported out of the box is not nothing. I think it would be unwise to go a different way unless you have some compelling reason that gRPC / HTTP/2 is fundamentally flawed this this intended use. For the FlightData message in particular, if a particular Flight client is unconcerned with memory optimizations, they can not bother with it and simply leave the serialization to their Protocol Buffers implementation. This also means that Arrow-agnostic gRPC clients can interact with Flight services using only the Flight.proto and some knowledge about what commands the server provides. In speaking with others parties about Flight, there is some interest in supporting different underlying data movement schemes while preserving the gRPC command layer, e.g. optimizing for high-bandwidth networking like infiniband. - Wes > Regards > > Antoine. > > > Le 12/02/2019 à 21:44, David Ming Li a écrit : > > Hi all, > > > > > > > > We've been evaluating Flight for our use, and we're wondering if the > > protocol is still open to extensions, as having a few application-defined > > metadata fields would help our use cases a lot. > > > > > > > > (Apologies if this is a repost - was having issue with the spam filter.) > > > > > > > > Specifically, in DoGet, having a metadata binary blob in the server->client > > messages would help implement resumable requests, especially as we have > > non-monotonically-indexed data streams. This would also help us reuse > > server-side state if we do have to resume a stream. > > > > > > > > In DoPut, we think making this call bidirectional would be useful to > > support application-level ACKs, again to implement resumable uploads. The > > server would thus have the option to send back an application-defined > > binary blob at any point during an upload. This is less important, as you > > could imagine starting a plain gRPC server-streaming call alongside the > > Flight DoPut call to do the same. But as you can't bind a gRPC and Flight > > service on the same port/channel, this is somewhat inconvenient. > > > > > > > > That leads me to the API-level niggles we have; it would be nice to be able > > to bind gRPC services alongside a Flight service, and conversely be able to > > reuse a gRPC channel across gRPC and Flight clients, though breaking the > > hiding of gRPC isn't desirable. > > > > > > > > Meanwhile, it would be nice to wrap the gRPC server 'awaitTermination' > > methods, so that we don't have to busy-wait ourselves (as in Java) or have > > the option to not busy-wait taken away from us (as in C++). In particular, > > when investigating Python bindings to C++ [0], the fact that > > FlightServerBase::Run also calls grpc::Server::Wait for you means that > > Ctrl-C no longer works in Python. > > > > > > > > Does what we're trying to accomplish make sense? Are there better ways to > > achieve resumable uploads/downloads in the current protocol? > > > > > > > > [0]: https://github.com/apache/arrow/pull/3566 > > > > > > > > Thanks, > > > > David > > > >