Le 12/02/2019 à 22:34, Wes McKinney a écrit :
> On Tue, Feb 12, 2019 at 2:48 PM Antoine Pitrou <anto...@python.org> wrote:
>>
>>
>> Hi David,
>>
>> I think allowing to send application-specific ancillary data in addition
>> to Arrow data makes sense.
>>
>> (I'm also wondering whether the choice of gRPC is appropriate at all -
>> the current C++ hacks around "zero-copy" are not pretty and they may not
>> translate to other languages either)
>>
> 
> This is unrelated to the discussion of extending the Flight protocol,
> but I'm not sure I would describe the serialization optimizations that
> have been implemented as "hacks". gRPC exposes its message
> serialization layer among other things to permit extensibility and to
> not require the use of Protocol Buffers necessarily.

One thing that surfaced is that the current implementation relies on C++
undefined behaviour (the reinterpret_cast from pb::FlightData to the
unrelated struct FlightData).  I don't know if there's a way to
reimplement the optimization without that cast, but otherwise it's cause
for worry, IMHO.

> The reason that we chose to use the Protobuf wire format for all
> message types, including data, is that there is excellent
> cross-language support for protobufs, and among production-ready RPC
> frameworks, gRPC has the most robust language support, covering pretty
> much all the languages we care about:
> https://github.com/grpc/grpc#to-start-using-grpc. The only one missing
> is Rust, and I reckon that will get rectified at some point (there is
> already https://github.com/stepancheg/grpc-rust, maybe it will be
> adopted into gRPC formally at some point). But to have C++, C#, Go,
> Java, and Node officially supported out of the box is not nothing. I
> think it would be unwise to go a different way unless you have some
> compelling reason that gRPC / HTTP/2 is fundamentally flawed this this
> intended use.

Since our use case pretty much requires high-performance transmission
with as few copies as possible (ideally, data should be directly sent
from/received to Arrow buffers without any intermediate userspace
copies), I think we should evaluate whether gRPC can allow us to achieve
that (there are still copies currently, AFAICT), and at which cost.

As a side note, the Flight C++ benchmark currently achieves a bit more
than 2 GB/s here.  There may be ways to improve this number (does gRPC
enable TLS by default? does it compress by default?)...

Regards

Antoine.

Reply via email to