Le 12/02/2019 à 22:34, Wes McKinney a écrit : > On Tue, Feb 12, 2019 at 2:48 PM Antoine Pitrou <anto...@python.org> wrote: >> >> >> Hi David, >> >> I think allowing to send application-specific ancillary data in addition >> to Arrow data makes sense. >> >> (I'm also wondering whether the choice of gRPC is appropriate at all - >> the current C++ hacks around "zero-copy" are not pretty and they may not >> translate to other languages either) >> > > This is unrelated to the discussion of extending the Flight protocol, > but I'm not sure I would describe the serialization optimizations that > have been implemented as "hacks". gRPC exposes its message > serialization layer among other things to permit extensibility and to > not require the use of Protocol Buffers necessarily.
One thing that surfaced is that the current implementation relies on C++ undefined behaviour (the reinterpret_cast from pb::FlightData to the unrelated struct FlightData). I don't know if there's a way to reimplement the optimization without that cast, but otherwise it's cause for worry, IMHO. > The reason that we chose to use the Protobuf wire format for all > message types, including data, is that there is excellent > cross-language support for protobufs, and among production-ready RPC > frameworks, gRPC has the most robust language support, covering pretty > much all the languages we care about: > https://github.com/grpc/grpc#to-start-using-grpc. The only one missing > is Rust, and I reckon that will get rectified at some point (there is > already https://github.com/stepancheg/grpc-rust, maybe it will be > adopted into gRPC formally at some point). But to have C++, C#, Go, > Java, and Node officially supported out of the box is not nothing. I > think it would be unwise to go a different way unless you have some > compelling reason that gRPC / HTTP/2 is fundamentally flawed this this > intended use. Since our use case pretty much requires high-performance transmission with as few copies as possible (ideally, data should be directly sent from/received to Arrow buffers without any intermediate userspace copies), I think we should evaluate whether gRPC can allow us to achieve that (there are still copies currently, AFAICT), and at which cost. As a side note, the Flight C++ benchmark currently achieves a bit more than 2 GB/s here. There may be ways to improve this number (does gRPC enable TLS by default? does it compress by default?)... Regards Antoine.